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PREFACE 


When I began writing the first edition, my intent was to write a text in time-series 
macroeconometrics. Fortunately, a number of my colleagues convinced me to broaden 
the focus. Applied microeconomists have embraced time-series methods, and the 
political science journals have become more quantitative. As in the previous editions, 
examples are drawn from macroeconomics, agricultural economics, international 
finance, and my work with Todd Sandler on the study of domestic and transnational 
terrorism. You should find that the examples in the text provide a reasonable balance 
between macroeconomic and microeconomic applications. 


1. BACKGROUND 


The text is intended for those with some background in multiple regression analysis. 
I presume the reader understands the assumptions underlying the use of ordinary least 
squares. All of my students are familiar with the concepts correlation and covariation; 
they also know how to use t-tests and F-tests in a regression framework. I use terms such 
as mean square error, significance level, and unbiased estimate without explaining their 
meaning. Two chapters of the text examine multiple time-series techniques. To work 
through these chapters, it is necessary to know how to solve a system of equations 
using matrix algebra. Chapter 1, entitled “Difference Equations,” is the cornerstone 
of the text. In my experience, this material and a knowledge of regression analysis 
are sufficient to bring students to the point where they are able to read the professional 
journals and to embark on a serious applied study. Nevertheless, one unfortunate reader 
wrote, “I did everything you said in you book, and my article still got rejected.” 

Some of the techniques illustrated in the text need to be explicitly programmed. 
Structural VARs need to be estimated using a package that has the capacity to manip- 
ulate matrices. Monte Carlo methods are very computer intensive. Nonlinear models 
need to be estimated using a package that can perform nonlinear least squares and max- 
imum likelihood estimation. Completely menu-driven software packages are not able 
to estimate every form of time-series model. As I tell my students, by the time a pro- 
cedure appears on the menu of an econometric software package, it is not new. To get 
the most from the text, you should have access to a program such as EViews, RATS, 
MATLAB, R, STATA, SAS, or GAUSS. 

I take the term applied that appears in the title earnestly. Toward this end, 
I believe in teaching by induction. The method is to take a simple example and 
build toward more general and more complicated models. Detailed examples of each 
procedure are provided. Each concludes with a step-by-step summary of the stages 
typically employed in using that procedure. The approach is one of learning by doing. 
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A large number of solved problems are included in the body of each chapter. 
The “Questions and Exercises” section at the end of each chapter is especially 
important. You are encouraged to work through as many of the examples and exercises 
as possible. 


2. WHAT IS NEW IN THE FOURTH EDITION? 


I have tried to be careful about the trade-off between being complete and being con- 
cise. In deciding on which new topics to include in the text, I relied heavily on the 
e-mail messages I received from instructors and from students. To keep the manuscript 
from becoming encyclopedic, I have included a number of new topics in the Sup- 
plementary Manual. The new material in Chapter 2 discusses the important issue of 
combining multiple univariate forecasts so as to reduce overall forecast error vari- 
ance. Chapter 3 expands the discussion of multivariate GARCH models by illustrating 
volatility impulse response functions. In doing so, volatility spillovers need to be ana- 
lyzed in a way that is analogous to the impulse responses from a VAR. I received a 
surprisingly large number of questions regarding autoregressive distributed lag (ADL) 
models. As such, the first few parts of Chapter 5 have been rewritten so as to show the 
appropriate ways to properly identify and estimate ADLs. This new material comple- 
ments the material in Chapter 6 involving ADLs in a cointegrated system. Chapter 7 
now discusses the so-called Davies problem involving unidentified nuisance parame- 
ters under the null hypothesis. The chapter continues to discuss the issues involved with 
testing for multiple endogenous breaks (i.e., potential breaks occurring at an unknown 
date) using the Bai—Perron procedure. Moreover, since breaks can manifest themselves 
slowly, the process of estimating a model with a logistic break is illustrated. 

Some content has been moved to the website for the Fourth Edition. This content 
is called out in the Table of Contents as being “online.” To locate this content, go to 
Wiley.com/College/Enders or to time-series.net. 


3. ADDITIONAL MATERIALS 


Since it was necessary to exclude some topics from the text, I prepared a Supplementary 
Manual to the text. This manual contains material that I deemed important (or interest- 
ing), but not sufficiently important for all readers, to include in the text. Often the text 
refers you to this Supplementary Manual to obtain additional information on a topic. 

To assist you in your programming, I have written a RATS Programming Manual 
to accompany this text. Of course, it is impossible for me to have versions of the guide 
for every possible platform. Most programmers should be able to transcribe a program 
written in RATS into the language used by their personal software package. 

An Instructors’ Manual is available to those adopting the text for their class. 
The manual contains the answers to all of the mathematical questions. It also contains 
programs that can be used to reproduce most of the results reported in the text and 
all of the models indicated in the “Questions and Exercises” sections. Versions of the 
manual are available for EVIEWS, RATS, SAS, and STATA users. 


PREFACE İX 


I have prepared a set of Powerpoint slides for each chapter. I have prepared the 
slides from the materials I use in my own class. As such, they emphasize on the material 
I deem to consider most important. Moreover, some of the slides expand in the material 
in the text. 

Wiley makes all of the manuals available to the faculty who use the text for their 
class. The Supplementary Manual and several versions of the Programming Manual 
can be downloaded (at no charge) from the Wiley website or from my personal website: 
www.time-series.net. The Programming Manual can also be downloaded from the 
ESTIMA website: www.estima.com. 

In spite of all my efforts, some errors have undoubtedly crept into the text. If the 
first three editions are any guide, the number is embarrassingly large. I will keep an 
updated list of typos and corrections on my website www.time-series.net. 

Many people made valuable suggestions for improving the organization, style, and 
clarity of the manuscript. I received a great number of e-mails from readers who pointed 
out typos and who made very useful suggestions concerning the exposition of the text. 
I am grateful to my students who kept me challenged and were quick to point out 
errors. Especially helpful were my former students Karl Boulware, Pin Chung, Sela- 
hattin Dibooglu, HyeJin Lee, Jing Li, Eric Olson, Ling Shao, and Jingan Yuan. Pierre 
Siklos and Mark Wohar who made a number of important suggestions concerning the 
revised chapters for the second edition. I learned so much about time series from Barry 
Falk and Junsoo Lee that they deserve a special mention. I would like to thank my lov- 
ing wife, Linda, for putting up with me during my illness (especially during the time I 
was working on the manuscript). 

Just before writing the preface to the third edition, I learned that Clive Granger had 
died. A few months before I was to take a sabbatical at the University of Minnesota, 
I had the opportunity to present a seminar at UCSD. At the time, I was working with 
overlapping generations models and had no thoughts about being an applied econo- 
metrician. However, when I first met Clive, he stated: “It will be 100 degrees warmer 
here than in Minnesota next winter. Why not do the sabbatical here?” I changed my 
plans, thinking that I would work with the math—econ types at UCSD. Fortunately, 
I happened to sit through one of his classes (team-taught with Robert Engle) and fell in 
love with time-series econometrics. I know that it tickled Clive to tell people the story 
of how his class clearly changed my career. In an important way, he and Robert Engle 
are responsible for the approach taken in the text. 
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CHAPTER 1 


DIFFERENCE EQUATIONS 


Learning Objectives 
1. Explain how stochastic difference equations can be used for forecasting and 
illustrate how such equations can arise from familiar economic models. 


2. Explain what it means to solve a difference equation. 


3. Demonstrate how to find the solution to a stochastic difference equation using 
the iterative method. 


4. Demonstrate how to find the homogeneous solution to a difference equation. 
5. Illustrate the process of finding the homogeneous solution. 


6. Show how to find homogeneous solutions in higher order difference 
equations. 


7. Show how to find the particular solution to a deterministic difference 
equation. 

8. Explain how to use the Method of Undetermined Coefficients to find the par- 
ticular solution to a stochastic difference equation. 


9. Explain how to use lag operators to find the particular solution to a stochastic 
difference equation. 


INTRODUCTION 


The theory of difference equations underlies all of the time-series methods employed in 
later chapters of this text. It is fair to say that time-series econometrics is concerned with 
the estimation of difference equations containing stochastic components. The tradi- 
tional use of time-series analysis was to forecast the time path of a variable. Uncovering 
the dynamic path of a series improves forecasts since the predictable components of the 
series can be extrapolated into the future. The growing interest in economic dynamics 
has given a new emphasis to time-series econometrics. Stochastic difference equations 
arise quite naturally from dynamic economic models. Appropriately estimated equa- 
tions can be used for the interpretation of economic data and for hypothesis testing. 


1. TIME-SERIES MODELS 


The task facing the modern time-series econometrician is to develop reasonably 
simple models capable of forecasting, interpreting, and testing hypotheses concerning 
economic data. The challenge has grown over time; the original use of time-series 
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analysis was primarily as an aid to forecasting. As such, a methodology was developed 
to decompose a series into a trend, a seasonal, a cyclical, and an irregular component. 
The trend component represented the long-term behavior of the series and the cyclical 
component represented the regular periodic movements. The irregular component 
was stochastic and the goal of the econometrician was to estimate and forecast this 
component. 

Suppose you observe the fifty data points shown in Figure 1.1 and are interested 
in forecasting the subsequent values. Using the time-series methods discussed in the 
next several chapters, it is possible to decompose this series into the trend, seasonal, 
and irregular components shown in the lower panel of the figure. As you can see, the 
trend changes the mean of the series, and the seasonal component imparts a regular 
cyclical pattern with peaks occurring every twelve units of time. In practice, the trend 
and seasonal components will not be the simplistic deterministic functions shown in 
this figure. The modern view maintains that a series contains stochastic elements in 
the trend, seasonal, and irregular components. For the time being, it is wise to sidestep 
these complications so that the projection of the trend and seasonal components into 
periods 51 and beyond is straightforward. 

Notice that the irregular component, while lacking a well-defined pattern, is some- 
what predictable. If you examine the figure closely, you will see that the positive and 
negative values occur in runs; the occurrence of a large value in any period tends to 
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be followed by another large value. Short-run forecasts will make use of this posi- 
tive correlation in the irregular component. Over the entire span, however, the irregular 
component exhibits a tendency to revert to zero. As shown in the lower part, the projec- 
tion of the irregular component past period 50 rapidly decays toward zero. The overall 
forecast, shown in the top part of the figure, is the sum of each forecasted component. 

The general methodology used to make such forecasts entails finding the equation 
of motion driving a stochastic process and using that equation to predict subsequent 
outcomes. Let y, denote the value of a data point at period t; if we use this notation, 
the example in Figure 1.1 assumes we observed y, through yso. For t = 1 to 50, the 
equations of motion used to construct components of the y, series are 


Trend: T, = 1+ 0.1 
Seasonal: S, = 1.6 sin(tz/6) 
Irregular: J, = 0.71,_, + €, 


where: T, 
S, 


I, = the value of the irregular component in t 


value of the trend component in period t 


value of the seasonal component in t 


€, = a pure random disturbance in t 
Thus, the irregular disturbance in t is 70% of the previous period’s irregular disturbance 
plus a random disturbance term. 

Each of these three equations is a type of difference equation. In its most gen- 
eral form, a difference equation expresses the value of a variable as a function of its 
own lagged values, time, and other variables. The trend and seasonal terms are both 
functions of time and the irregular term is a function of its own lagged value and of 
the stochastic variable £,. The reason for introducing this set of equations is to make 
the point that time-series econometrics is concerned with the estimation of difference 
equations containing stochastic components. The time-series econometrician may esti- 
mate the properties of a single series or a vector containing many interdependent series. 
Both univariate and multivariate forecasting methods are presented in the text. Chapter 
2 shows how to estimate the irregular part of a series. Chapter 3 considers estimating 
the variance when the data exhibit periods of volatility and tranquility. Estimation of 
the trend is considered in Chapter 4, which focuses on the issue of whether the trend is 
deterministic or stochastic. Chapter 5 discusses the properties of a vector of stochastic 
difference equations, and Chapter 6 is concerned with the estimation of trends in a mul- 
tivariate model. Chapter 7 introduces the new and growing area of research involving 
nonlinear time-series models. 

Although forecasting has always been the mainstay of time-series analysis, the 
growing importance of economic dynamics has generated new uses for time-series 
analysis. Many economic theories have natural representations as stochastic difference 
equations. Moreover, many of these models have testable implications concerning the 
time path of a key economic variable. Consider the following four examples: 


1. The Random Walk Hypothesis: In its simplest form, the random walk 
model suggests that day-to-day changes in the price of a stock should have 
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a mean value of zero. After all, if it is known that a capital gain can be made 
by buying a share on day ¢ and selling it for an expected profit the very next 
day, efficient speculation will drive up the current price. Similarly, no one will 
want to hold a stock if it is expected to depreciate. Formally, the model asserts 
that the price of a stock should evolve according to the stochastic difference 
equation 


Yr T Yit F Em1 


or 


AY = Eni 


where y, = the logarithm of the price of a share of stock on day ¢, and €,,, = 
a random disturbance term that has an expected value of zero. 
Now consider the more general stochastic difference equation 


AYp41 = Xo + OY; + Er 


The random walk hypothesis requires the testable restriction: a) =a, = 0. 
Rejecting this restriction is equivalent to rejecting the theory. Given the 
information available in period ż, the theory also requires that the mean 
of €,,, be equal to zero; evidence that €,,, is predictable invalidates the 
random walk hypothesis. Again, the appropriate estimation of this type of 
single-equation model is considered in Chapters 2 through 4. 


Reduced-Forms and Structural Equations: Often it is useful to collapse a 
system of difference equations into separate single-equation models. To illus- 
trate the key issues involved, consider a stochastic version of Samuelson’s 
(1939) classic model: 


Y= C, +4, (1.1) 
Cy = AY | + Ey 0<a<l (1.2) 
iy = BCC, — Cy1) + Ein p>0 (1.3) 


where y,, c,, and i, denote real GDP, consumption, and investment in time 
period f, respectively. In this Keynesian model, y,, c,, and i, are endogenous 
variables. The previous period’s GDP and consumption, y,_; and c,_,, are 
called predetermined or lagged endogenous variables. The terms €,, and £; 
are zero mean random disturbances for consumption and investment, and the 
coefficients a@ and p are parameters to be estimated. 

The first equation equates aggregate output (GDP) with the sum of con- 
sumption and investment spending. The second equation asserts that con- 
sumption spending is proportional to the previous period’s GDP plus a ran- 
dom disturbance term. The third equation illustrates the accelerator principle. 
Investment spending is proportional to the change in consumption; the idea is 
that growth in consumption necessitates new investment spending. The error 
terms €,, and £; represent the portions of consumption and investment not 
explained by the behavioral equations of the model. 
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Equation (1.3) is a structural equation since it expresses the endoge- 
nous variable i, as being dependent on the current realization of another 
endogenous variable, c,. A reduced-form equation is one expressing the 
value of a variable in terms of its own lags, lags of other endogenous vari- 
ables, current and past values of exogenous variables, and disturbance terms. 
As formulated, the consumption function is already in reduced form; current 
consumption depends only on lagged income and the current value of the 
stochastic disturbance term €,,. Investment is not in reduced form because it 
depends on current period consumption. 

To derive a reduced-form equation for investment, substitute (1.2) into 
the investment equation to obtain 


i, = Play) + Eet — Gy] + Ei 
= apy,_) — By + BE gy + Ei 


Notice that the reduced-form equation for investment is not unique. You 
can lag (1.2) one period to obtain: c,_; = ay,_ + €.,_,. Using this expression, 
the reduced-form investment equation can also be written as 


iy = apy, — BOAY;_2 + Ect-1) + BE ce + Ei 
= AB — Yt-2) + BE — Ect-1) + Eit (1.4) 


Similarly, a reduced-form equation for GDP can be obtained by substi- 
tuting (1.2) and (1.4) into (1.1): 


Yi = AY + Ege + APO — Y2) + BlE ce — Ect—1) + Eir 
=a(1+ P)y,_) — abya + (1 + P)Eo + Ei — BE er 
so that y, can be written in the form 
y, = ay,_; + by,» +X (1.5) 


where a = a(1+ f),b = -af, and x, = (1 + Pe. + €% — PE gy. 

Equation (1.5) is a univariate reduced-form equation; y, is expressed 
solely as a function of its own lags and a disturbance term. A univariate model 
is particularly useful for forecasting since it enables you to predict a series 
based solely on its own current and past realizations. It is possible to esti- 
mate (1.5) using the univariate time-series techniques explained in Chapters 2 
through 4. Once you have obtained estimates of a and b, it is straightforward 
to use the observed values of y, through y, to predict all future values in the 
Series (1.€., Yis Vp425 +++) 

Chapter 5 considers the estimation of multivariate models when all vari- 
ables are treated as jointly endogenous. The chapter also discusses the restric- 
tions needed to recover (i.e., identify) the structural model from the estimated 
reduced-form model. 


Error-Correction: Forward and Spot Prices: Certain commodities and 
financial instruments can be bought and sold on the spot market (for imme- 
diate delivery) or for delivery at some specified future date. For example, 


6 CHAPTER1 DIFFERENCE EQUATIONS 


suppose that the price of a particular foreign currency on the spot market is 
s, dollars and that the price of the currency for delivery one period into the 
future is f, dollars. Now, consider a speculator who purchased forward cur- 
rency at the price f, dollars per unit. At the beginning of period ¢ + 1, the 
speculator receives the currency and pays f, dollars per unit received. Since 
spot foreign exchange can be sold at s,,,, the speculator can earn a profit (or 
loss) of s,,, — f; per unit transacted. 

The Unbiased Forward Rate (UFR) hypothesis asserts that expected prof- 
its from such speculative behavior should be zero. Formally, the hypothesis 
posits the following relationship between forward and spot exchange rates: 


S1 = Se t Em1 (1.6) 


where €,,, has a mean value of zero from the perspective of time period ¢. 

In (1.6), the forward rate in ¢ is an unbiased estimate of the spot rate in 
t + 1. Thus, suppose you collected data on the two rates and estimated the 
regression 


S1 = Ay + Of, + Eni 


If you were able to conclude that a) = 0, a; = 1, and that the regression 
residuals €,,, have a mean value of zero from the perspective of time period t, 
the UFR hypothesis could be maintained. 

The spot and forward markets are said to be in long-run equilibrium 
when €,,, = 0. Whenever s,,, turns out to differ from f,, some sort of adjust- 
ment must occur to restore the equilibrium in the subsequent period. Consider 
the adjustment process 


S142 = St T AlS —Sil + Esra a>0 (1.7) 
Siri = fe + Pls — fl + Epi B>0 (1.8) 


where €,,,5 and €,,,, both have a mean value of zero. 

Equations (1.7) and (1.8) illustrate the type of simultaneous adjustment 
mechanism considered in Chapter 6. This dynamic model is called an 
error-correction model because the movement of the variables in any period 
is related to the previous period’s gap from long-run equilibrium. If the spot 
rate s,,, turns out to equal the forward rate f,, (1.7) and (1.8) state that the 
spot rate and forward rates are expected to remain unchanged. If there is a 
positive gap between the spot and forward rates so that s,,; — f; > 0, (1.7) 
and (1.8) lead to the prediction that the spot rate will fall and the forward rate 
will rise. 

4. Nonlinear Dynamics: All of the equations considered thus far are linear (in 
the sence that each variable is raised to the first power) with constant coeffi- 
cients. Chapter 7 considers the estimation of models that allow for more com- 
plicated dynamic structures. Recall that (1.3) assumes investment is always a 
constant proportion of the change in consumption. It might be more realistic 
to assume investment responds more to positive than to negative changes in 
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consumption. After all, firms might want to take advantage of positive con- 
sumption growth but simply let the capital stock decay in response to declines 
in consumption. Such behavior can be captured by modifying (1.3) such that 
the coefficient on (c, — c,_,) is not constant. Consider the specification 


i, = Bic, — Cy) — Abali — Cy-1) + Ei 


where f, > pa > 0 and A, is an indicator function such that 4, = 1 if 
(c, — c1) < 0, otherwise A, = 0. Hence, if (c, — c1) = 0, A, = 0 

so that i, = By(c,;— C1) + E; and if (c, — c1) < 0, A, = 1 so that 

i, = (Pi — Bo) (ci — c1) + Ep. Since p} — Bz > O, investment is more 
responsive to positive than negative changes in consumption. 


2. DIFFERENCE EQUATIONS AND THEIR SOLUTIONS 


Although many of the ideas in the previous section were probably familiar to you, it 
is necessary to formalize some of the concepts used. In this section, we will examine 
the type of difference equation used in econometric analysis and make explicit what 
it means to “solve” such equations. To begin our examination of difference equations, 
consider the function y = f(t). If we evaluate the function when the independent vari- 
able t takes on the specific value ¢*, we get a specific value for the dependent variable 
called y». Formally, y = f(t*). Using this same notation, y,+,,;, represents the value 
of y when ¢ takes on the specific value f* + h. The first difference of y is defined as 
the value of the function when evaluated at t = t* + h minus the value of the function 
evaluated at t“: 


AY nan =f +h) — f) 
=Veth — Ye (1.9) 


Differential calculus allows the change in the independent variable (i.e., the 
term A) to approach zero. Since most economic data is collected over discrete periods, 
however, it is more useful to allow the length of the time period to be greater than zero. 
Using difference equations, we normalize units so that h represents a unit change in 
t (i.e., h = 1) and consider the sequence of equally spaced values of the independent 
variable. Without any loss of generality, we can always drop the asterisk on ¢*. We can 
then form the first differences: 


Ay, =f- f(t- 1) Vr = Vr 
AY p44 =f(t+ 1) —f() Yeti T Ve 
Ayo =f(t+ 2) —f+D = Vero Ya 


Often it will be convenient to express the entire sequence of values {... y,3.¥;-1. Yo 
Verto Yr42> «+ } aS {y,}. We can then refer to any particular value in the sequence as y,. 
Unless specified, the index ¢ runs from —oo to +00. In time-series econometric models, 
we use f to represent “time” and h to represent the length of a time period. Thus, y, 
and y,,; might represent the realizations of the {y,} sequence in the first and second 
quarters of 2014, respectively. 
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In the same way we can form the second difference as the change in the first 
difference. Consider 


A’y, = A(Ay,) = AQ, — Y1) = Or- We) — O1 — Yi-2) =  — 2-1 + Yi-2 
Ay) = A(AY,41) = AO T Y) = Oma — Wp) Or — Y) = Veg — Yi + Y 


The nth difference (A”) is defined analogously. At this point, we risk taking the 
theory of difference equations too far. As you will see, the need to use second differ- 
ences rarely arises in time-series analysis. It is safe to say that third- and higher order 
differences are never used in applied work. 

Since most of this text considers linear time-series methods, it is possible to exam- 
ine only the special case of an nth-order linear difference equation with constant coef- 
ficients. The form for this special type of difference equation is given by 

n 
Y= a+ Yay +X, (1.10) 
i=l 

The order of the difference equation is given by the value of n. The equation is lin- 
ear because all values of the dependent variable are raised to the first power. Economic 
theory may dictate instances in which the various a; are functions of variables within 
the economy. However, as long as they do not depend on any of the values of y, or x,, 
we can regard them as parameters. The term x, is called the forcing process. The form 
of the forcing process can be very general; x, can be any function of time, current and 
lagged values of other variables, and/or stochastic disturbances. From an appropriate 
choice of the forcing process, we can obtain a wide variety of important macroeco- 
nomic models. Re-examine equation (1.5), the reduced-form equation for real GDP. 
This equation is a second-order difference equation since y, depends on y,_4. The forc- 
ing process is the expression (1 + f)é,, + Eg — PEcr-1- You will note that (1.5) has no 
intercept term corresponding to the expression dy in (1.10). 

An important special case for the {x,} sequence is 


co 
x, = > BE: 
i=0 


where the p; are constants (some of which can equal zero) and the individual elements 
of the sequence {€,} are not functions of the y,. At this point it is useful to allow the 
{€,} sequence to be nothing more than a sequence of unspecified exogenous shocks. 
For example, let {€,} be a random error term and set fy = 1 and f; = f, =--- = 0; in 
this case, (1.10) becomes the autoregression equation 


Yı = ao FAY) + AgYj—9 Fs FOV py + E 


Let n= 1, ag = 0, and a, = 1 to obtain the random walk model. Notice that 
equation (1.10) can be written in terms of the difference operator (A). Subtracting 
y,-; from (1.10), we obtain 


n 


Yı Y1 = 4o + (a; = 1) y1 + by aiYti + X; 
i=? 
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or defining y = (a, — 1), we get 
Ay, =a) + YY; + Jayita (1.11) 
i=2 

Clearly, equation (1.11) is simply a modified version of (1.10). 

A solution to a difference equation expresses the value of y, as a function of the ele- 
ments of the {x,} sequence and ¢ (and possibly some given values of the {y,} sequence 
called initial conditions). Examining (1.11) makes it clear that there is a strong anal- 
ogy to integral calculus, where the problem is to find a primitive function from a given 
derivative. We seek to find the primitive function f(t), given an equation expressed in 
the form of (1.10) or (1.11). Notice that a solution is a function rather than a number. 
The key property of a solution is that it satisfies the difference equation for all permissi- 
ble values of t and {x,}. Thus, the substitution of a solution into the difference equation 
must result in an identity. For example, consider the simple difference equation Ay, = 2 
(or y, = yı + 2). You can easily verify that a solution to this difference equation is 
y, = 2t + c, where c is any arbitrary constant. By definition, if 2t + c is a solution, it 
must hold for all permissible values of t. Thus, for period t— 1, y,_; = 2(t— 1) +c. 
Now substitute the solution into the difference equation to form 


+c = At—l)+c+2 (1.12) 


It is straightforward to carry out the algebra and verify that (1.12) is an identity. 
This simple example also illustrates that the solution to a difference equation need not 
be unique; there is a solution for any arbitrary value of c. 

Another useful example is provided by the irregular term shown in Figure 1.1; 
recall that the equation for this expression is: Z, = 0.7/,_; + €,. You can verify that the 
solution to this first-order equation is 


= VON; (1.13) 


Since (1.13) holds for all time periods, the value of the irregular component in 
t — 1 is given by 


ha = YOM; (1.14) 
i=0 


Now substitute (1.13) and (1.14) into Z, = 0.7/,_, + £, to obtain 
E, + 0.7e1 + (0.7 e507 E3 + 
= 0.7[€,_; + 0.7€,_5 + (0.7) E3 + (0.7e4 +: 1+; (1.15) 
The two sides of (1.15) are identical; this proves that (1.13) is a solution to the 
first-order stochastic difference equation J, = 0.71,_, + €,. Be aware of the distinction 


between reduced-form equations and solutions. Since J, = 0.71,_, + £, holds for all val- 
ues of f, it follows that J,_; = 0.7/,_. + €,_;. Combining these two equations yields 


I, = 0.7[0.7L_2 + E1] + Er 
= 0.491, + 0.7e1 +, (1.16) 
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Equation (1.16) is a reduced-form equation since it expresses /, in terms of its own 
lags and disturbance terms. However, (1.16) does not qualify as a solution because it 
contains the “unknown” value of J,_,. To qualify as a solution, (1.16) must express J, 
in terms of the elements x,, t, and any given initial conditions. 


3. SOLUTION BY ITERATION 


The solution given by (1.15) was simply postulated. The remaining portions of this 
chapter develop the methods you can use to obtain such solutions. Each method has its 
own merits; knowing the most appropriate to use in a particular circumstance is a skill 
that comes only with practice. This section develops the method of iteration. Although 
iteration is the most cumbersome and time-intensive method, most people find it to be 
very intuitive. 

If the value of y in some specific period is known, a direct method of solution is 
to iterate forward from that period to obtain the subsequent time path of the entire y 
sequence. Refer to this known value of y as the initial condition or the value of y in 
time period 0 (denoted by yọ). It is easiest to illustrate the iterative technique using the 
first-order difference equation 


Y; = ao + ayy,_1 + E; (1.17) 
Given the value of yọ, it follows that y, will be given by 
Yı = ao + 41yo + £1 
In the same way, y, must be 
Y= Ay + ay, + Ep 
= ag + alao + 41yo + E1] + E2 
= ag + aga, + (a) yo+ aE +E 
Continuing the process in order to find y3, we obtain 
Y3 = ao + ayo + E3 
=a [l + ay + (ay) ] + (a1) Yo + 417E, + 41E + €3 


You can easily verify that for all t > 0, repeated iteration yields 


il tl 
Y= a +a yo+ Y deni (1.18) 
i=0 i=0 


Equation (1.18) is a solution to (1.17) since it expresses y, as a function of t, the 
forcing process x, = X(a,)'e,_;, and the known value of yọ. As an exercise, it is useful 
to show that iteration from y, back to yọ yields exactly the formula given by (1.18). 
Since y, = dy + 4, y;,_1 + €;, it follows that 

Yr = Aq + Gy lao + A) Y;-2 + E11 + E; 
= a(l + ay) + 4E; + E; + a1’ [ao + ayy ,-3 + € 1-21 


Continuing the iteration back to period 0 yields equation (1.18). 
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Iteration without an Initial Condition 


Suppose you were not given the initial condition for yg. The solution given by (1.18) 
would no longer be appropriate because the value of yọ is an unknown. You would not 
be able to select this initial value of y and iterate forward, nor would you be able to iter- 
ate backward from y, and simply choose to stop at t = tọ. Thus, suppose we continued 
to iterate backward by substituting dy + a,;y_, + £ọ for yọ in (1.18): 


t-1 t-1 


y= ay >) al + a (dy + ayy_, + £p) + By dEi 


i=0 i=0 
t t 
i j 1 
=a) yi a + Ñ aeni + atty (1.19) 
i=0 i=0 


Continuing to iterate backward another m periods, we obtain 


t+m ttm 


Y= a d +Y dernit ar yn (1.20) 
i=0 i=0 


Now examine the pattern emerging from (1.19) and (1.20). If |a,| < 1, the term 
a,'*"'+! approaches zero as m approaches infinity. Also, the infinite sum [1 +a, + 
(a,)* +--+] converges to 1/(1 — a,). Thus, if we temporarily assume that |a,| < 1, 
after continual substitution, (1.20) can be written as 


Y, =a /(1-a)t+ J dEn; (1.21) 
i=0 
You should take a few minutes to convince yourself that (1.21) is a solution to the 
original difference equation (1.17); substitution of (1.21) into (1.17) yields an identity. 
However, (1.21) is not a unique solution. For any arbitrary value of A, a solution to 
(1.17) is given by 
y = Adi +a9/(1-a,)+ È ae, (1.22) 
i=0 

To verify that (1.22) is a solution for any arbitrary value of A, substitute (1.22) 

into (1.17) to obtain 


co 
y, =Aa\, +ao/(1 — ay) + J aeni 
i=0 


co 
=ay+a, |Aai! +ao/(1- a) + 2 de iil +E, 
i=0 
Since the two sides are identical, (1.22) is necessarily a solution to (1.17). 


Reconciling the Two Iterative Methods 


Given the iterative solution (1.22), suppose that you are now given an initial condi- 
tion concerning the value of y in the arbitrary period tọ. It is straightforward to show 
that we can impose the initial condition on (1.22) to yield the same solution as (1.18). 
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Since (1.22) must be valid for all periods (including tọ), when f = 0, it must be true 
that 


yo =A + ao/(1- a1) +) aye; (1.23) 
i=0 
so that 


A = yo — a/(1 — ay) - 2 dEi 
i=0 
Since yg is given, we can view (1.23) as the value of A that renders (1.22) as a 
solution to (1.17), given the initial condition. Hence, the presence of the initial condition 
eliminates the arbitrariness of A. Substituting this value of A into (1.22) yields 


y= [yg dg/(1 - a) — Jae a +ao/(1- a) + Dd hee (1.24) 
i=0 i=0 


Simplification of (1.24) results in 
t-1 
Y, = Do — 49/(1 — ay )] ay + ao/(1 — ay) + > a Eri (1.25) 
i=0 
It is a worthwhile exercise to verify that (1.25) is identical to (1.18). 


Nonconvergent Sequences 


Given that |a,| < 1, (1.21) is the limiting value of (1.20) as m grows infinitely large. 
What happens to the solution in other circumstances? If |a,| > 1, it is not possible 
to move from (1.20) to (1.21) because the expression |a; |" grows infinitely large as 
t + m approaches oo.! However, if there is an initial condition, there is no need to obtain 
the infinite summation. Simply select the initial condition yg and iterate forward; the 
result will be (1.18): 


t-1 t-1 
= i t i 
y = ay >) a, + ayo + by ajEri 
i=0 i=0 


Although the successive values of the {y,} sequence will become progressively 
larger in absolute value, all values in the series will be finite. 
A very interesting case arises if a; = 1. Rewrite (1.17) as 


Y: = do + Yy tE; 
or 
Ay, = dy) + €; 

As you should verify by iterating from y, back to yo, a solution to this equation is? 

t 
Y =agtt J Ei +yo (1.26) 

i=1 
After a moment’s reflection, the form of the solution is quite intuitive. In every 
period 1, the value of y, changes by a) + €, units. After t periods, there are £ such 


changes; hence, the total change is tag plus the t values of the {€,} sequence. Notice 
that the solution contains summation of all disturbances from e; through e,. Thus, 
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when a, = 1, each disturbance has a permanent non-decaying effect on the value of y,. 
You should compare this result to the solution found in (1.21). For the case in which 
|a,;| < 1, Ja,|' is a decreasing function of t so that the effects of past disturbances 
become successively smaller over time. 

The importance of the magnitude of a, is illustrated in Figure 1.2. Thirty random 
numbers with a theoretical mean equal to zero were computer-generated and denoted 
by £; through €3,. Then the value of yg was set equal to unity and the next 30 values of 
the {y,} sequence were constructed using the formula y, = 0.9y,_, + €,. The result is 
shown by the thin line in Panel (a) of Figure 1.2. If you substitute ag = 0 and a, = 0.9 


¥;=0.9V.,+& ¥,=0.5y,1 + & 
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into (1.18), you will see that the time path of {y,} consists of two parts. The first part, 
0.9’, is shown by the slowly decaying thick line in the panel. This term dominates the 
solution for relatively small values of t. The influence of the random part is shown by 
the difference between the thin and the thick line; you can see that the first several 
values of {€,} are negative. Notice that as ¢ increases, the influence of the initial value 
Yo = 1 becomes less pronounced. 

Using the previously drawn random numbers, we again set yọ equal to unity and 
a second sequence was constructed using the formula y, = 0.5y,_, + €,. This second 
sequence is shown by the thin line in Panel (b) of Figure 1.2. The influence of the 
expression 0.5’ is shown by the rapidly decaying thick line. Again, as ¢ increases, 
the random portion of the solution becomes more dominant in the time path of {y,}. 
When we compare the first two panels, it is clear that reducing the magnitude of |a,| 
increases the rate of convergence. Moreover, the discrepancies between the simulated 
values of y, and the thick line are less pronounced in the second panel. As you can see 
in (1.18), each value of ¢,_; enters the solution for y, with a coefficient of (a,)/. The 
smaller value of a, means that the past realizations of €,_; have a smaller influence on 
the current value of y,. 

Simulating a third sequence with a, = —0.5, yields the thin line shown in Panel (c). 
The oscillations are due to the negative value of a,. The expression (—0.5)‘, shown by 
the thick line, is positive when ¢ is even and negative when f is odd. Since |a,| < 1, the 
oscillations are dampened. 

The next three panels in Figure 1.2 all show nonconvergent sequences. Each uses 
the initial condition yọ = 1 and the same 30 values of {€,} used in the other simulations. 
The line in Panel (d) shows the time path of y, = y,_| + €,. Since each value of £, has an 
expected value of zero, Panel (d) illustrates a random walk process. Here Ay, = £, so 
that the change in y, is purely random. The nonconvergence is shown by the tendency 
of {y,} to meander. In Panel (e), the thick line representing the explosive expression 
(1.2) dominates the random portion of the {y,} sequence. Also notice that the discrep- 
ancy between the simulated {y,} sequence and the thick line widens as ¢ increases. The 
reason is that past values of £,_; enter the solution for y, with the coefficient (1.2)’. 
As i increases, the importance of these previous discrepancies becomes increasingly 
important. Similarly, setting a, = —1.2 results in the exploding oscillations shown in 
the lower-right panel of the figure. The value (— 1.2) is positive for even values of t and 
negative for odd values of t. 


4. AN ALTERNATIVE SOLUTION METHODOLOGY 


Solution by the iterative method breaks down in higher order equations. The alge- 
braic complexity quickly overwhelms any reasonable attempt to find a solution. Fortu- 
nately, there are several alternative solution techniques that can be helpful in solving the 
nth-order equation given by (1.10). If we use the principle that you should learn to walk 
before you learn to run, it is best to step through the first-order equation given by (1.17). 
Although you will be covering some familiar ground, the first-order case illustrates the 
general methodology extremely well. To split the procedure into its component parts, 
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consider only the homogeneous portion of (1.17)* 
Yi = UY r-1 (1.27) 


The solution to this homogeneous equation is called the homogeneous solution; at 
times it will be useful to denote the homogeneous solution by the expression yt . Obvi- 
ously, the trivial solution y, = y,_; = +--+ = 0 satisfies (1.27). However, this solution is 
not unique. By setting dp and all values of {€,} equal to zero, (1.18) becomes y, = a, Yo- 
Hence, y, = a Yo must be a solution to (1.27). Yet, even this solution does not consti- 
tute the full set of solutions. It is easy to verify that the expression a multiplied by any 
arbitrary constant A satisfies (1.27). Simply substitute y, = Aa‘, and y,_, = Aa‘! into 
(1.27) to obtain 


Aa 


Since af = a}af', it follows that y, = Aa‘ also solves (1.27). With the aid of the 
thick lines in Figure 1.2, we can classify the properties of the homogeneous solution 
as follows: 


= 1-1 
= aA) 


1. If |a,| < 1, the expression a converges to zero as t approaches infinity. Con- 


vergence is direct if 0 < a, < | and oscillatory if —1 < a, < 0. 

2. If|a,| > 1, the homogeneous solution is not stable. If a, > 1, the homoge- 
neous solution approaches oo as ¢ increases. If a; < —1, the homogeneous 
solution oscillates explosively. 

3. Ifa, = 1, any arbitrary constant A satisfies the homogeneous equation y, = 
y,-1- Ifa, = —1, the system is meta-stable: a, = 1 for even values of t and —1 
for odd values of t. 


Now consider (1.17) in its entirety. In the last section, you confirmed that (1.21) is 
a valid solution to (1.17). Equation (1.21) is called a particular solution to the differ- 
ence equation; all such particular solutions will be denoted by the term y? . The term 
“particular” stems from the fact that a solution to a difference equation may not be 
unique; hence, (1.21) is just one particular solution out of the many possibilities. 

In moving to (1.22) you verified that the particular solution was not unique. The 
homogeneous solution Aa plus the particular solution given by (1.21) constituted the 
complete solution to (1.17). The general solution to a difference equation is defined 
to be a particular solution plus all homogeneous solutions. Once the general solution 
is obtained, the arbitrary constant A can be eliminated by imposing an initial condition 
for yo. 


The Solution Methodology 


The results of the first-order case are directly applicable to the nth-order equation given 
by (1.10). In this general case, it will be more difficult to find the particular solution and 
there will be distinct homogeneous solutions. Nevertheless, the solution methodology 
will always entail the following four steps: 


STEP 1: form the homogeneous equation and find all n homogeneous solutions; 
STEP 2: find a particular solution; 
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STEP 3: obtain the general solution as the sum of the particular solution and a linear 
combination of all homogeneous solutions; 

STEP 4: eliminate the arbitrary constant(s) by imposing the initial condition(s) on the 
general solution. 


Before we address the various techniques that can be used to obtain homoge- 
neous and particular solutions, it is worthwhile to illustrate the methodology using the 
equation 


y, = 0.9y,_) — 0.2y,_5 + 3 (1.28) 
Clearly, this second-order equation is in the form of (1.10) with ag = 3, a, = 0.9, 
dy = —0.2, and x, = 0. Beginning with the first of the four steps, form the homogenous 
equation 

y, — 0.9y,_; + 0.2y,_, = 0 (1.29) 


In the first-order case of (1.17), the homogeneous solution was Aa. Section 1.6 will 
show you how to find the complete set of homogeneous solutions. For now, it is suf- 
ficient to assert that the two homogeneous solutions are yi = (0.5) and y} = (0.4%. 
To verify the first solution, note that y ais (0.5)! and yt 2 (0.5)'-*. Thus, y! Lis 
a solution if it satisfies 


(0.5) — 0.9(0.5)'"! + 0.20.5)" = 0 
If we divide by (0.5)'~*, the issue is whether 
(0.5)? — 0.9(0.5) + 0.2 = 0 
Carrying out the algebra, 0.25 — 0.45 + 0.2 does equal zero so that (0.5) is a solution 
to (1.29). In the same way, it is easy to verify that y = (0.4)! is a solution since 
(0.4)' — 0.9(0.4)'"! + 0.2(0.4)'? = 0 


Divide by (0.4)'~? to obtain (0.4)? — 0.9(0.4) + 0.2 = 0.16 — 0.36 + 0.2 = 0. 

The second step is to obtain a particular solution; you can easily confirm that the 
particular solution y? = 10 solves (1.28) as 10 = 0.9(10) — 0.2(10) + 3. 

The third step is to combine the particular solution and a linear combination of 
both homogeneous solutions to obtain 


y, = A,(0.5)' + A, (0.4) + 10 


where A, and A, are arbitrary constants. 

For the fourth step, assume you have two initial conditions for the {y,} sequence. 
So that we can keep our numbers reasonably round, suppose that yọ = 13 andy, = 11.3. 
Thus, for periods zero and one, our solution must satisfy 


11.3 =A,(0.5) + A(0.4) + 10 
Solving simultaneously for A, and A,, you should find A, = 1 and A, = 2. Hence, the 


solution is 
y, = (0.5)' + 2(0.4)' + 10 
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You can substitute y, = (0.5) + 2(0.4) + 10 into (1.28) to verify that the solution is 
correct. 


Generalizing the Method 


To show that this method is applicable to higher order equations, consider the homo- 
geneous part of (1.10): 


n 


Y= Yay (1.30) 


i=1 

As shown in Section 1.6, there are n homogeneous solutions that satisfy (1.30). For 
now, it is sufficient to demonstrate the following proposition: Zf ye is a homogeneous 
solution to (1.30), Ay}! is also a solution for any arbitrary constant A. By assumption, 
yt solves the homogeneous equation so that 


n 


yt = Yay, (1.31) 


The expression Ay}! is also a solution if 

n 
Ayt = 2 a;Ay! , (1.32) 

i=1 
We know (1.32) is satisfied because dividing each term by A yields (1.31). Now 
suppose that there are two separate solutions to the homogeneous equation denoted by 
y , and yh p Itis straightforward to show that for any two constants A, and Ap, the linear 
combination A, yh tA yh _ 1S also a solution to the homogeneous equation. If A, yt + 

Ayh _ is a solution to (1.30), it must satisfy 


A Yie + Any = lA Yir + A2211 + a [AL Viro + Aaya] + 
F a lA y} n + Aoy5, nl 


Regrouping terms, we want to know if 


n n 
h J 
favt = svt. + j yi- È Aa; in =N 
il isl 


Since A, yi , and Ayh , are separate solutions to (1.30), each of the expressions in 
brackets is zero. Hence, the linear combination is necessarily a solution to the homo- 
geneous equation. This result easily generalizes to all n homogeneous solutions of an 
nth-order equation. 

Finally, the use of Step 3 is appropriate since the sum of any particular solution 
and any linear combination of all homogeneous solutions is also a solution. To prove 
this proposition, substitute the sum of the particular and homogeneous solutions into 
(1.10) to obtain 


n 
y? + yf = a + 2 ao +y +, (1.33) 
i=l 
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Recombining the terms in (1.33), we want to know if 
y; — a- > ayes J F G = by ooh =0 (1.34) 
i=l i=l 


Since y? solves (1.10), the expression in the first bracket of (1.34) is zero. Since yt 
solves the homogeneous equation, the expression in the second bracket is zero. Thus, 
(1.34) is an identity; the sum of the homogeneous and particular solutions solves (1.10). 


5. THE COBWEB MODEL 


An interesting way to illustrate the methodology outlined in the previous section is to 
consider a stochastic version of the traditional cobweb model. Since the model was 
originally developed to explain the volatility in agricultural prices, let the market for a 
product—say, wheat—be represented by 


d,=a-yp, y>0 (1.35) 
5,=b + Bp* +e, p>o (1.36) 
s,=d, (1.37) 


where: d, = demand for wheat in period t 
s, = supply of wheat in t 
p, = market price of wheat in t 
př = price that farmers expect to prevail at t 
€, = a zero mean stochastic supply shock 
and parameters a, b, y, and p are all positive such that a > b4 
The nature of the model is such that consumers buy as much wheat as is desired 
at the market clearing price p,. At planting time, farmers do not know the price pre- 
vailing at harvest time; they base their supply decision on the expected price (p¥). The 
actual quantity produced depends on the planned quantity b + fp? plus a random supply 
shock €,. Once the product is harvested, market equilibrium requires that the quantity 
supplied equals the quantity demanded. Unlike the actual market for wheat, the model 
does not allow for the possibility of storage. The essence of the cobweb model is that 
farmers form their expectations in a naive fashion; let farmers use last year’s price as 
the expected market price 
Pr = Pmi (1.38) 


Point E in Figure 1.3 represents the long-run equilibrium price and quantity com- 
bination. Note that the equilibrium concept in this stochastic model differs from that 
of the traditional cobweb model. If the system is stable, successive prices will tend to 
converge to point E. However, the nature of the stochastic equilibrium is such that the 
ever-present supply shocks prevent the system from remaining at E. Nevertheless, it is 
useful to solve for the long-run price. If we set all values of the {€,} sequence equal to 
zero, set p, = p,_; = +++ = p, and equate supply and demand, the long-run equilibrium 
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FIGURE 1.3 The Cobweb Model 


price is given by p = (a — b)/(y + p). Similarly, the equilibrium quantity (s) is given 
by s = (ap + yb)/(y + P). 

To understand the dynamics of the system, suppose that farmers in ¢ plan to produce 
the equilibrium quantity s. However, let there be a negative supply shock such that 
the actual quantity produced turns out to be s,. As shown by point | in Figure 1.3, 
consumers are willing to pay p, for the quantity s,; hence, market equilibrium in ¢ occurs 
at point 1. Updating one period allows us to see the main result of the cobweb model. 
For simplicity, assume that all subsequent values of the supply shock are zero (i.e., 
E1 = Em2 = ++: = 0). At the beginning of period t+ 1, farmers expect the price at 
harvest time to be the price of the previous period; thus, p*,, = p,. Accordingly, they 
produce quantity s,,, (see point 2 in the figure); consumers, however, are willing to buy 
quantity s,,, only if the price falls to that indicated by p,,, (see point 3 in the figure). 
The next period begins with farmers expecting to be at point 4. The process continually 
repeats until the equilibrium point EF is attained. 

As drawn, Figure 1.3 suggests that the market will always converge to the long-run 
equilibrium point. This result does not hold for all demand and supply curves. To for- 
mally derive the stability condition, combine (1.35) through (1.38) to obtain 


b+ pp.) +€,=a4- YP; 
or 
Pi = (P/P + (a— b)/y — €,/7 (1.39) 


Clearly, (1.39) is a stochastic first-order linear difference equation with constant 
coefficients. To obtain the general solution, proceed using the four steps listed at the 
end of the last section: 


1. Form the homogeneous equation p, = (—f/y)p,_;. In the next section you 
will learn how to find the solution(s) to a homogeneous equation. For now, it 
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is sufficient to verify that the homogeneous solution is 
p! = A(-B/7) 


where A is an arbitrary constant. 


2. Note that (1.39) is a first-order difference equation in the form p, = dg + 
a4P,-, + €, Where dy = (a — b)/y, a, = —(B/y), and e, = —e,/y. If the ratio 
B/y is less than unity, you can iterate (1.39) backward from p, to verify that 
the particular solution for the price is 


(1.40) 


ce 


If B/y > 1, the infinite summation in (1.40) is not convergent. As discussed 
in the last section, it is necessary to impose an initial condition on (1.40) if 
B/y = 1. 

3. The general solution is the sum of the homogeneous and particular solutions; 
combining these two solutions, the n solution is 


i+ A(—B/y) (1.41) 


Py = 

y+ P EE 

4. In (1.41), A is an arbitrary constant that can be eliminated if we know the 
price in some initial period. For convenience, let this initial period have a time 
subscript of zero. Since the solution must hold for every period, including 
period zero, it must be the case that 


Po= - tye P/E + A(-B/y) 


a r2 


Since (—f/y)° = 1, the value of A is a by 


A = Po - 


Ei 


a 


Substituting this solution for A back into (1.41) yields 


vl fo 
ay y 0 y+ 


and, after simplifying the two summations, 


p 
Tea? 


rl oF 
i+]—] |Po- (1.42) 
a +| y| E° rv+B 
We can interpret (1.42) in terms of Figure 1.3. In order to focus on the stability of 


the system, temporarily assume that all values of the {£,} sequence are zero. Subse- 
quently, we will return to a consideration of the effects of supply shocks. If the system 


P: = 


a l= 
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begins in long-run equilibrium, the initial condition is such that pọ = (a — b)/(y + P). 
In this case, inspection of equation (1.42) indicates that p, = (a — b)/(y + p). Thus, if 
we begin the process at point E, the system remains in long-run equilibrium. 

Instead, suppose that the process begins at a price below long-run equilibrium: 
Po < (a — b)/(y + p). Equation (1.42) tells us that p, is 


pı = (a= b)/( + B) + [po — (a = b)/ 0 + BY p/v)" (1.43) 


Since pọ < (a — b)/(y + P) and —p/y < 0, it follows that p, will be above the 
long-run equilibrium price (a — b)/(y + p). In period 2, 


P = (a — b)/ (Y + P) + [po — (a = b)/ 0 + P (p/v? 


Although po < (a—b)/(y + P), (—B/y} is positive; hence, p, is below the 
long-run equilibrium. For the subsequent periods, note that (—p/y)} will be positive 
for even values of t and negative for odd values of t. Just as we found graphically, the 
successive values of the {p,} sequence will oscillate above and below the long-run 
equilibrium price. Since (f/y)' goes to zero if p < y and explodes if p > y, the mag- 
nitude of #/y determines whether the price actually converges toward the long-run 
equilibrium. If B/y < 1, the oscillations will diminish in magnitude, and if B/y > 1, 
the oscillations will be explosive. 

The economic interpretation of this stability condition is straightforward. The 
slope of the supply curve (i.e., dp,/ds,) is 1/2 and the absolute value of the slope 
of the demand curve [i.e., —dp,/0(d,)] is 1/y. If the supply curve is steeper than the 
demand curve, it must be the case that 1/2 > 1/y or B/y < 1 so that the system is 
stable. This is precisely the case illustrated in Figure 1.3. As an exercise, you should 
draw a diagram with the demand curve steeper than the supply curve and show that 
the price oscillates and diverges from the long-run equilibrium. 

Now consider the effects of the supply shocks. The contemporaneous effect of 
a supply shock on the price of wheat is the partial derivative of p, with respect to €,; 
from (1.42) 

i (1.44) 

Equation (1.44) is called the impact multiplier since it shows the impact effect of 
a change in £, on the price in f. In terms of Figure 1.3, a negative value of £, implies a 
price above the long-run price p; the price in t rises by 1 /y units for each unit decline in 
current period supply. Of course, this terminology is not specific to the cobweb model; 
in terms of the nth-order model given by (1.10), the impact multiplier is the partial 
derivative of y, with respect to the partial change in the forcing process. 

The effects of the supply shock in ¢ persist into future periods. Updating (1.42) by 
one period yields the one-period multiplier: 


OP 41 


as 
e - b/v) 


= p/y* 
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Point 3 in Figure 1.3 illustrates how the price in t+ 1 is affected by the negative 
supply shock in t. It is straightforward to derive the result that the effects of the supply 
shock decay over time. Since J /y < 1, the absolute value of dp,/de, exceeds Op,,, /0€,. 
All of the multipliers can be derived analogously; updating (1.42) by two periods 


OP 142/0E, = —(1/y)(-B/y)y? 


and after n periods: 


OP 4 n/ OE; = —(1/y)(-B/y)" 


The time path of all such multipliers is called the impulse response function. This 
function has many important applications in time-series analysis because it shows how 
the entire time path of a variable is affected by a stochastic shock. Here, the impulse 
function traces the effects of a supply shock in the wheat market. In other economic 
applications, you may be interested in the time path of a money supply shock or a 
productivity shock on real GDP. 

In actuality, the function can be derived without updating (1.42) because it is 
always the case that 

OP r+) _ Op; 
OE, 


t OE, 


To find the impulse response function, simply find the partial derivative of (1.42) with 
respect to the various €,_;. These partial derivatives are nothing more than the coeffi- 
cients of the {&,_;} sequence in (1.42). 

Each of the three components in (1.42) has a direct economic interpretation. The 
deterministic portion of the particular solution (a — b)/(y + P) is the long-run equi- 
librium price; if the stability condition is met, the {p,} sequence tends to converge 
to this long-run value. The stochastic component of the particular solution captures 
the short-run price adjustments due to the supply shocks. The ultimate decay of the 
coefficients of the impulse response function guarantees that the effects of changes 
in the various €, are of a short-run duration. The third component is the expression 
(-—B/y)'A = (—B/y)' [po — (a — b)/(y + B)]. The value of A is the initial period’s devia- 
tion of the price from its long-run equilibrium level. Given that J /y < 1, the importance 
of this initial deviation diminishes over time. 


6. SOLVING HOMOGENEOUS DIFFERENCE 
EQUATIONS 


Higher order difference equations arise quite naturally in economic analysis. 
Equation (1.5)—the reduced-form GDP equation resulting from Samuelson’s (1939) 
model—is an example of a second-order difference equation. Moreover, in time-series 
econometrics it is quite typical to estimate second- and higher order equations. To 
begin our examination of homogeneous solutions, consider the second-order equation 


Yt 7 GY-1 T AY- = (1.45) 
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Given the findings in the first-order case, you should suspect that the homogeneous 
solution has the form yt = Aa’. Substitution of this trial solution into (1.45) yields 


Aa’ — aAa"! — a,Aa*? = 0 (1.46) 
Clearly, any arbitrary value of A is satisfactory. If you divide (1.46) by Aa’~?, the prob- 
lem is to find the values of «æ that satisfy 


a” —a,a—a,=0 (1.47) 


Solving this quadratic equation—called the characteristic equation— yields two val- 
ues of «, called the characteristic roots. Using the quadratic formula, we find that the 


two characteristic roots are 
+ 2 + 4 
ay a dz 


2 
=(a, + Vd)/2 (1.48) 


a) ,A = 


where d is the discriminant [at + 4a]. 

Each of these two characteristic roots yields a valid solution for (1.45). Again, 
these solutions are not unique. In fact, for any two arbitrary constants A; and Aj, the 
linear combination A; (æ) + A,(a@,)' also solves (1.45). As proof, simply substitute 
y, = Aj(@,)' + A, (œ) into (1.45) to obtain 


Aj(a,)! + Ao (ay) = a [A (a1)! + Ap (an)* 1] + a [A (a1)? + Ala)? 
Now, regroup terms as follows: 
A, [(a,)! — a; (œ)! E a(&)™?] + A [Ca = ates) = ala ] =0 


Since œ; and a, each solve (1.45), both terms in brackets must equal zero. As such, the 
complete homogeneous solution in the second-order case is 


y’ =A (a1) + Ag (a)! 


Without knowing the specific values of a, and ay, we cannot find the two characteristic 
roots a, and az. Nevertheless, it is possible to characterize the nature of the solution; 
three possible cases are dependent on the value of the discriminant d. 


CASE 1 


If a,? + 4a, > 0, d is a real number and there will be two distinct real character- 
istic roots. Hence, there are two separate solutions to the homogeneous equation 
denoted by (a,)' and (a,)’. We already know that any linear combination of the 
two is also a solution. Hence, 


yf! = A (a) + Ag (a) 
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SECOND-ORDER EQUATIONS 


Example 1: y, = 0.2y,_; + 0.35y,_,. Hence: a, = 0.2 and a, = 0.35 
Form the homogeneous equation: y, — 0.2y,_, — 0.35y, = 0 
A check of the discriminant reveals: d = a? + 4a, so that d = 1.44. Given 
that d > 0, the roots will be real and distinct. 
Let the trial solution have the form: y, = a’. Substitute the trial solution into 
the homogenous equation to obtain: a’ — 0.2a™! — 0.35 a’? = 0 
Divide by a@'~? to obtain the characteristic equation: a? — 0.2a — 0.35 = 0 
Compute the two characteristic roots: 
a, = 0.5(a, + d'/?) a, = 0.5(a, — d'/”) 
a, =0.7 a, = —0.5 
The homogeneous solution is: A, (0.7)! + A,(—0.5)‘. The first graph shows 
the time path of this solution for the case in which the arbitrary constants 
equal unity and ¢ runs from 1 to 20. 
Example 2: y, = 0.7y,_; + 0.35y,_,. Hence: a, = 0.7 and a, = 0.35 
Form the homogeneous equation: y, — 0.7y,_, — 0.35y,_, = 0 
A check of the discriminant reveals: d = a; + 4a, so that d = 1.89. Given 
that d > 0, the roots will be real and distinct. 
Form the characteristic equation a‘ — 0.7a'~! — 0.35a'~? = 0 
Divide by a‘? to obtain the characteristic equation: a? — 0.7a — 0.35 = 0 
Compute the two characteristic roots: 
a, = 0.5(a, + d!⁄?) a, = 0.5(a, — d'/”). 
a, = 1.037 a, = —0.337 
The homogeneous solution is: A; (1.037) + A,(—0.337)'. The second graph 
shows the time path of this solution for the case in which the arbitrary con- 
stants equal unity and f runs from 1 to 20. 
1 Example 1 55 Example 2 
0.5 1.5 7 
0 0.5 l 
0 10 20 0 10 20 
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It should be clear that if the absolute value of either a, or a exceeds 
unity, the homogeneous solution will explode. Worksheet 1.1 examines two 
second-order equations showing real and distinct characteristic roots. In the 
first example, y, = 0.2y,) + 0.35y,_5, the characteristic roots are shown to 
be a, =0.7 and a, = —0.5. Hence, the full homogeneous solution is y,” = 
A, (0.7)! + A,(—0.5)’. Since both roots are less than unity in absolute value, the 
homogeneous solution is convergent. As you can see in the graph on the bottom 
left-hand side of Worksheet 1.1, convergence is not monotonic because of the 
influence of the expression (—0.5)’. 

In the second example, y, = 0.7y,_; + 0.35y,,. The worksheet indicates 
how to obtain the solution for the two characteristic roots. Given that one 
characteristic root is 1.037, the {y,} sequence explodes. The negative root (a, = 
—0.337) is responsible for the nonmonotonicity of the time path. Since (—0.337)' 
quickly approaches zero, the dominant root is the explosive value 1.037. 


CASE 2 


Ifa? + 4a, = 0, it follows that d = Oanda, = a, = a,/2. Hence, a homogeneous 
solution is a, /2. However, when d = 0, there is a second homogeneous solution 
given by ¢(a, /2)’. To demonstrate that yt = t(a,/2)' is a homogeneous solution, 
substitute it into (1.45) to determine whether 


t(a, /2)' — a [t — 1)(a,/2)"] — allt- 2)(a, /27?] = 0 
Divide by (a, /2)'~* and form 
—[(a,/4) + alt + [(a}/2) + 2ay] = 0 


Since we are operating in the circumstance where a + 4a, = 0, each bracketed 
expression is zero; hence, t(a,/2)' solves (1.45). Again, for arbitrary constants 
A, and Aj, the complete homogeneous solution is 


yy! = A (a, /2)! + Azta, /2)! 


Clearly, the system is explosive if |a,| > 2. If |a,| < 2, the term A,(a,/2)’ con- 
verges, but you might think that the effect of the term #(a,/2)’ is ambiguous 
[since the diminishing (a,/2)' is multiplied by ¢]. This ambiguity is correct in 
the limited sense that the behavior of the homogeneous solution is not mono- 
tonic. As illustrated in Figure 1.4 for a,/2 = 0.95, 0.9, and —0.9, as long as 
|a,| < 2, lim[t(a,/2)'] is necessarily zero as t —> oo; thus, there is always con- 
vergence. For 0 < a, < 2, the homogeneous solution appears to explode before 
ultimately converging to zero. For —2 < a, < 0, the behavior is wildly erratic; 
the homogeneous solution appears to oscillate explosively before the oscillations 
dampen and finally converge to zero. 
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t(-0.9)' 0 
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FIGURE 1.4 The homogeneous solution t(a,)‘ 


CASE 3 


If a|? + 4a, < 0, it follows that d is negative so that the characteristic roots are 
imaginary. Since a,* > 0, imaginary roots can occur only if a, < 0. Although this 
might be hard to interpret directly, if we switch to polar coordinates it is possible 
to transform the roots into more easily understood trigonometric functions. The 
technical details are presented in Appendix 1.1 of the Supplementary Manual. 
For now, write the two characteristic roots as 


a, =(a,+iV—d)/2 œ = (a, -iV—d)/2 
where i = Ve 
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As shown in Appendix 1.1, de Moivre’s theorem allows us to write the homo- 
geneous solution as 
y! = pir’ cos(Ot + fy) (1.49) 


where f; and p, are arbitrary constants, r = (—a,)!/?, and the value of 8 is chosen 
so as to satisfy 
cos(@) = a, /[2(—a,)!/] (1.50) 


The trigonometric functions impart a wavelike pattern to the time path of the 
homogeneous solution; note that the frequency of the oscillations is determined 
by 0. Since cos(@t) = cos(2z + 01), the stability condition is determined solely 
by the magnitude of r = (—a))!/*. If |a)| = 1, the oscillations are of unchanging 
amplitude; the homogeneous solution is periodic. The oscillations will dampen 
if |a| < 1 and explode if |a,| > 1. 


Example: It is worthwhile to work through an exercise using an equation with 
imaginary roots. The left-hand side of Worksheet 1.2 examines the behavior of 
the equation y, = 1.6y,_, — 0.9y,_,. A quick check shows that the discriminant d 
is negative so that the characteristic roots are imaginary. If we transform to polar 
coordinates, the value of r is given by (0.9)!/2 = 0.949. From (1.50), cos(@) = 
1.6/(2 * 0.949) = 0.843. You can use a trig table or a calculator to show that 
0 = 0.567 (i.e., if cos(0) = 0.843, 8 = 0.567). Thus, the homogeneous solution is 


yt = B, (0.949) cos(0.567t + p2) (1.51) 
The graph on the left-hand side of Worksheet 1.2 sets 6, = 1 and p, = 0 and 
plots the homogeneous solution for t = 1, ... , 30. Case 2 uses the same value of 


dy (hence, r = 0.949) but sets a, = —0.6. Again, the value of d is negative; how- 
ever, for this set of calculations, cos(@) = —0.316 so that @ is 1.89. Comparing 
the two graphs, you can see that increasing the value of 0 acts to increase the 
frequency of the oscillations. 


Stability Conditions 


The general stability conditions can be summarized using triangle ABC in Figure 1.5. 
Arc AOB is the boundary between Cases 1 and 3; it is the locus of points where d = 
a,* + 4a, = 0. The region above AOB corresponds to Case 1 (since d > 0), and the 
region below AOB corresponds to Case 3 (since d < 0). 


In Case | (in which the roots are real and distinct), stability requires that the largest 


root be less than unity and the smallest root be greater than — 1. The largest characteristic 
root, a, = (a, + Va)/2, will be less than unity if 


a +) +4)? <2 or (a? +4a,)'" <2=2, 


Square each side to obtain the condition: 


a,’ + 4a, <4-4a, +a? 
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IMAGINARY ROOTS 
Example 1 Example 2 
+> L.6y,_; F 0.9y,_2 y, + 0.6y,_ T 0.99, 


(a) Check the discriminant d = (a,)’ + 4a, 


d= (1.6) + 4(-0.9) d = (—0.6) + 4(—0.9) 
= —1.04 = —3.24 
Hence, the roots are imaginary. The homogeneous solution has the form 


y? = B,r' cos(Ot + p) 


b) Obtain the value of r = ( —a,)"! 2 
where f, and p, are arbitrary constants. 


r = (0.9)! r = (0.9)! 
= 0.949 = 0.949 


c) Obtain 8 from cos(0) = a, / [2(— a,)"”] 


cos(0) = 1.6/[2(0.9)!/2] cos(8) = —0.6/[2(0.9)!/7] 
= 0.843 = -0.316 


Given cos(@), use a calculator or a trig-table to find 0: 
0 = 0.567 0 = 1.89 
d) Form the homogeneous solution: y! = f,r'cos(0t + >) 
yt = p (0.949) cos(0.567t + p.) y} = p (0.949) cos(1.89r + f,) 


For f, = | and J, = 0, the time paths of the homogeneous solutions are: 


17 = 
0.5 4 0.5 4 
0 | 1 0 
1 11 1 1 11 21 
-0.5 | -0.5 | 
Sql aq 
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FIGURE 1.5 Characterizing the Stability Conditions 


or 
a; +d,<1 (1.52) 
The smallest root, a, = (a, — Vd) /2, will be greater than minus one if 
a- (a? £4a)" > -2 or 24a, > (a? 4a,)'" 
Square each side to obtain the condition: 
4+ 4a; +a,? > a,? + 4a, 
or 
a,<1t+a, (1.53) 


Thus, the region of stability in Case | consists of all points in the region bounded by 
AOBC. For any point in AOBC, conditions (1.52) and (1.53) hold and d > 0. 

In Case 2 (repeated roots), a}? + 4a, = 0. The stability condition is |a,| < 2. Thus, 
the region of stability in Case 2 consists of all points on arc AOB. In Case 3 (d < 0), the 
stability condition is r = (—a,)!/ 2 < 1. Hence, 


—a, <1 (where a, < 0) (1.54) 


Thus, the region of stability in Case 3 consists of all points in region AOB. For any point 
in AOB, (1.54) is satisfied and d < 0. 

A succinct way to characterize the stability conditions is to state that the character- 
istic roots must lie within the unit circle. Consider the semicircle drawn in Figure 1.6. 
Real numbers are measured on the horizontal axis and imaginary numbers are mea- 
sured on the vertical axis. If the characteristic roots a, and a, are both real, they 
can be plotted on the horizontal axis. Stability requires that they lie within a circle 
of radius one. Complex roots will lie somewhere in the complex plane. If a, > 0, the 


roots a, = (a, + iy/d)/2 and œ, = (a, — iyd)/2 can be represented by the two points 
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Imaginary 


d"’?/2 


Real 


FIGURE 1.6 Characteristic Roots and the Unit Circle 


shown in Figure 1.6. For example, a, is drawn by moving a, /2 units along the real axis 


and Vd /2 units along the imaginary axis. Using the distance formula, the length of the 
radius r is given by 


r = y (a, /2} + (d'/2i/2} 
and, using the fact that i? = —1, we obtain 
F= (=a)! 


The stability condition requires that r < 1. Therefore, when plotted on the complex 
plane, the two roots œ} and æ, must lie within a circle of radius equal to unity. In the 
time-series literature it is simply stated that stability requires that all characteristic 
roots lie within the unit circle. 


Higher Order Systems 


The same method can be used to find the homogeneous solution to higher order differ- 
ence equations. The homogeneous equation for (1.10) is 
Y- È aV = 0 (1.55) 
i=1 
Given the results in Section 1.4, you should suspect each homogeneous solution to have 
the form yë = Aa’ where A is an arbitrary constant. Thus, to find the value(s) of a, we 


seek the solution for a 


Aa‘ — py aAa =0 (1.56) 
i=l 


or, dividing through by a‘~", we seek the values of a that solve 


a” — aa" — aa” —.-.-a, =0 (1.57) 
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This nth-order polynomial will yield n solutions for a. Denote these n characteristic 
roots by @,@, ... ,@,. As in Section 1.4, the linear combination Aya} + A203 ++ +++ 
A,,@;, is also a solution. The arbitrary constants A, through A, can be eliminated by 
imposing n initial conditions on the general solution. The a; may be real or complex 
numbers. Stability requires that all real valued a; be less than unity in absolute value. 
Complex roots will necessarily come in pairs. Stability requires that all roots lie within 
the unit circle shown in Figure 1.6. 

In most circumstances there is little need to directly calculate the characteristic 
roots of higher order systems. Many of the technical details are included in Section 1.2 
of the Supplementary Manual (Appendix 1.2 of this chapter). However, there are some 
useful rules for checking the stability conditions in higher order systems. 


1. Inan nth-order equation, a necessary condition for all characteristic roots to 


lie inside the unit circle is 
n 


Yas 


i=l 
2. Since the values of the a; can be positive or negative, a sufficient condition for 
all characteristic roots to lie inside the unit circle is 


n 
È Mail <1 
i=1 


3. At least one characteristic root equals unity if 


n 


ya =1 


i=l 
Any sequence that contains one or more characteristic roots that equal unity 
is called a unit root process. 


4. Fora third-order equation, the stability conditions can be written as 
l-—a,;-—a,-a,>0 
l+a,;-—a,+a,>0 

1 — aqja; +a, — a3? > 0 
3+4,+4,-3a,>0 or 3-—a,;+a,+3a,>0 


Given that the first three inequalities are satisfied, either of the last two can 
be checked. One of the last conditions is redundant, given that the other three 
hold. 


7. PARTICULAR SOLUTIONS FOR 
DETERMINISTIC PROCESSES 


Finding the particular solution to a difference equation is often a matter of ingenu- 
ity and perseverance. The appropriate technique depends heavily on the form of the 
{x,} process. We begin by considering those processes that contain only deterministic 
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components. Of course, in econometric analysis, the forcing process will contain both 
deterministic and stochastic components. 


CASE 1 


x, = 0. When all elements of the {x,} process are zero, the difference equation 
becomes 


Vp = AQ + A Yj-1 + AQYj-2 HF Yin (1.58) 


Intuition suggests that an unchanging value of y (i.e., y, = y,_) =+- = c) should 
solve the equation. Substitute the trial solution y, = c into (1.58) to obtain 


c = a + aC +a +: HaC 


so that 
c= @/(1 -a — a, =: an) (1.59) 


As long as (1 — a, — a —- +: — a„) does not equal zero, the value of c given by 
(1.59) is a solution to (1.58). Hence, the particular solution to (1.58) is given by 
yy = a/l -a — a) — +++ = ap). 

If 1 — a, — a —---—a, = 0, the value of c in (1.59) is undefined; it is nec- 
essary to try some other form for the solution. The key insight is that {y,} is a 
unit root process if Xa; = 1. Since {y,} is not convergent, it stands to reason that 
the constant solution does not work. Instead, recall equations (1.12) and (1.26); 
these solutions suggest that a linear time trend can appear in the solution of a unit 
root process. As such, try the solution y? = ct. For ct to be a solution it must be 
the case that 


ct = dy +a,c(t— 1) + ac(t—2)+-- -+a c(t- n) 
or, combining like terms, 
(l-a — a) —+ ++ —4,)ct = ag — c(a; + 2a, + 3a3 +--+ na,) 
Since 1 — a, — a, — : : + — a„ = 0, select the value of c such that 
c = do/(a, + 2a, + 3a3 +--+ + na,) 


For example, let 
y, =2+0.75y,_; + 0.25y,_5 


Here, a, = 0.75 and a, = 0.25; {y,} is a unit root process because a, + 
a, = 1. The particular solution has the form ct, where c = 2/[0.75 + 2(0.25)] = 
1.6. In the event that the solution ct fails, sequentially try the solutions y?” = 
ct’, ct’, ... , ct”. For an nth-order equation, one of these solutions will always be 
the particular solution. 
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[case 2] 


The Exponential Case. Let x, have the exponential form b(d)", where b, d, and 
r are constants. Since r has the natural interpretation as a growth rate, we would 
expect to encounter this type of forcing process case in a growth context. We 
illustrate the solution procedure using the first-order equation 


Yı = ao + a1 Yr-1 + bd” (1.60) 


To try to gain an intuitive feel for the form of the solution, notice that if b = 0, 
(1.60) is a special case of (1.58). Hence, you should expect a constant to appear in 
the particular solution. Moreover, the expression d” grows at the constant rate r. 
Thus, you might expect the particular solution to have the form y? = cy + c,d", 
where cy and c} are constants. If this equation is actually a solution, you should be 
able to substitute it back into (1.60) and obtain an identity. Making the appropriate 
substitutions, we get 


co + cid" = ag + a[c + c,d) + bd” (1.61) 
For this solution to work, it is necessary to select cy and c, such that 

Co = 49/(1 — a) and c, = [bd"]/(d" — a,) 
Thus, a particular solution is 


a r 
p 0 a bd 


y= d 
a l-a; d'-a 


The nature of the solution is that y? equals the constant ay/(1 — a,) plus an 
expression that grows at the rate r. Note that for |d"| < 1, the particular solution 
converges to dg/(1 — a). 

If either a, = 1 or a, = d”, use the trick suggested in Case 1. If a, = 1, try 
the solution cy = ct, and if a, = d”, try the solution c, = tb. Use precisely the 
same methodology in higher order systems. 


CASE 3 


Deterministic Time Trend. In this case, let the {x,} sequence be represented by 
the relationship x, = bt“ where b is a constant and d is a positive integer. Hence, 


n 


Y, = ao + Yay; + df’ (1.62) 
i=] 


Since y, depends on fr, it follows that y,_; depends on (t — 1), y,-» 
depends on (t — 2)¢, and so on. As such, the particular solution has the form 
y? = co + cit +c, +--+ cyt. To find the value of each c;, substitute the 
particular solution into (1.62). Then select the value of each c; that results in an 
identity. Although various values of d are possible, in economic applications it is 
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common to see models incorporating a linear time trend (d = 1). For illustrative 
purposes, consider the second-order equation y, = dg + a)y,_; + y2 + bt. 
Posit the solution y” = co + cıt where cg and c, are undetermined coefficients. 
Substituting this “challenge solution” into the second-order difference equation 
yields 


Co + cyt = ag + alco + c1 (t — 1)] + a [co + cit — 2)] + bt (1.63) 


Now select values of cg and c, so as to force equation (1.63) to be an iden- 
tity for all possible values of t. If we combine all constant terms and all terms 
involving ¢, the required values of cg and c} are 


cy =b/(1 - a, — a) 
Co = [ao — (2a, + 1 )ey]/C — a; — ay) 


so that 
co = [dp /(1 — ay — a )] — [b/(1 = ay = a)" (2a + ay) 


Thus, the particular solution will also contain a linear time trend. You 
should have no difficulty foreseeing the solution technique if a; +a, = 1. In 
this circumstance—which is applicable to higher order cases, as well—try 
multiplying the original challenge solution by t. 


8. THE METHOD OF UNDETERMINED 
COEFFICIENTS 


At this point, it is appropriate to introduce the first of two useful methods for finding 
particular solutions when there are stochastic components in the {y,} process. The key 
insight of the method of undetermined coefficients is that linear equations have linear 
solutions. Hence, the particular solution to a linear difference equation is necessarily 
linear. Moreover, the solution can depend only on time, a constant, and the elements of 
the forcing process {x,}. Thus, it is often possible to know the exact form of the solu- 
tion even though the coefficients of the solution are unknown. The technique involves 
positing a solution—called a challenge solution— that is a linear function of all terms 
thought to appear in the actual solution. The problem becomes one of finding the set 
of values for those undetermined coefficients that solve the difference equation. 

The actual technique for finding the coefficients is straightforward. Substitute the 
challenge solution into the original difference equation and solve for the values of the 
undetermined coefficients that yield an identity for all possible values of the included 
variables. If it is not possible to obtain an identity, the form of the challenge solution is 
incorrect. Try a new trial solution and repeat the process. In fact, we used the method 
of undetermined coefficients when positing the challenge solutions y = co + c;d” and 
y? = co + cıt for Cases 2 and 3 in Section 1.7. 

To begin, reconsider the simple first-order equation y, = dg + a,y,_, + €,. Since 
you have solved this equation using the iterative method, the equation is useful for 
illustrating the method of undetermined coefficients. The nature of the {y,} process 
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is such that the particular solution can depend only on a constant term, time, and the 
individual elements of the {€,} sequence. Given that t does not explicitly appear in the 
forcing process, t can be in the particular solution only if the characteristic root is unity. 
Since the goal is to illustrate the method, posit the challenge solution: 


Y, = bo +bit+ Ý wei (1.64) 
i=0 
where bo, b4, and all the a; are the coefficients to be determined. 
Substitute (1.64) into the original difference equation to form 
by + bit + ape, + OE) HAE, t 
= ag + albo + b(t— 1) + ap€,_, tAE at | +, 
Collecting like terms, we obtain 
(bo — dg — abo + a, b,) +b, — a; )t + (ap — De, 
+ (a, — aya )E,_1 + (Ay — aya) )E 2 + (3 — a1@2)E 3 += 0 (1.65) 
Equation (1.65) must hold for all values of ¢ and all possible values of the {e,} 
sequence. Thus, each of the following conditions must hold: 


a—-1=0 
a) — aag =0 


a, — aya, =0 


bo — dg — aby + a,b, =0 
b, —a,b, =0 


Notice that the first set of conditions can be solved for the a; recursively. The 
solution of the first condition entails setting ag = 1. Given this solution for a, the next 
equation requires a, = a4. Moving down the list, a, = a,a, or a) = a,*. Continuing 
the recursive process, we find a; = a,'. Now consider the last two equations. There are 
two possible cases depending on the value of a,. If a, # 1, it immediately follows that 
b, = 0 and by = ay/(1 — a,). For this case, the particular solution is 


a œ 
_ 4 i 
Y= ia +) deni 
1 i=0 


Compare this result to (1.21); you will see that it is precisely the same solution 
found using the iterative method. The general solution is the sum of this particular 
solution plus the homogeneous solution Aa,'. Hence, the general solution is 


ao 


foe} 

i t 

+ 2; a} Eri + Aa 
i=0 


%5 l-a; 


Now, if there is an initial condition for yọ, it follows that 


co 
dg ; 
= + ae; +A 
Yo 1-a 2 {°-i 
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Combining these two equations so as to eliminate the arbitrary constant A, we obtain 


ag Si 
y= rt + Dai Eita | yo- a/(1- a) - J aei 
i=0 i=0 
so that F 
y= = va Ei + ai [yo — 4/(1 — a1 )] (1.66) 


i=0 


It can be easily verified that (1.66) is identical to (1.25). Instead, if a; = 1, bọ can be 
any arbitrary constant and b, = ay. The improper form of the solution is 


co 
Yt = bo + dot + Seni 
i=0 


The form of the solution is “improper” because the sum of the {£,} sequence may not 
be finite. Therefore, it is necessary to impose an initial condition. If the value yọ is 


given, it follows that 
oO 
Yo = bo + > ELi 
i=0 


Imposing the initial condition on the improper form of the solution yields (1.26) 


Y; =Yo + at+ De 


To take a second example, consider the equation 
Vp = do FAY + E; + PEL (1.67) 


Again, the solution can depend only on a constant, the elements of the {€,} sequence, 
and t raised to the first power. As in the previous example, t does not need to be included 
in the challenge solution if the characteristic root differs from unity. To reinforce this 
point, use the challenge solution given by (1.64). Substitute this tentative solution into 
(1.67) to obtain 


abt ae ate, a py Cee +E, + PiE 


i=0 i=0 
Matching coefficients on all terms containing €,, €;_1,€;_2, ... , yields 

ay = 1 

a, =a,a + pı [so that a, = a; + pı] 

Ay = aA, [so that a, = a, (a; + $) 


@3 = 21a, [so that a; = CAH (a, +] 


Qa; = a Qi] [so that Qa; = (a)! (a; + p] 
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Matching coefficients of intercept terms and coefficients of terms containing f, we get 


by =a) + ay by — a,b, 
bj =a,b, 


Again, there are two cases. If a, # 1, then b, = 0 and by = a,/(1 — a,). The particular 


solution is 
o0 


| ii 
y, = = FERED De Ei 


The general solution augments the particular solution with the term Aa. You are 
left with the exercise of imposing the initial condition for yọ on the general solution. 
Now consider the case in which a, = 1. The undetermined coefficients are such that 
b, = dy and bp is an arbitrary constant. The improper form of the solution is 


co 
Y, = by tagt+e,+(1 +B) È Eri 
i=l 


If yọ is given, it follows that 


Yo = bo + Eg + (1 + By) $e 
i=l 


Hence, imposing the initial condition, we obtain 


t-1 


Y= Yo +agt +E, +t $1) Dei 


i=1 


Higher Order Systems 


The identical procedure is used for higher order systems. As an example, let us find the 
particular solution to the second-order equation 


Yı = Ay + A1Yy-1 + AgYy-2 + E; (1.68) 
Since we have a second-order equation, we use the challenge solution 
y, = by + bit + bot” + age, + aE] + OnE, + °° 


where bo, b,, b2, and the a; are the undetermined coefficients. 
Substituting the challenge solution into (1.68) yields 


[bo + bit + bf] + age, + aE 1 +E t 


= ag + a; [bo + b(t — 1) + b(t — 1) + ae, it AE 2 + AE, 3+] 


+ albo +b C= 2) + b(t — 2}? + a£; 2+ AJE; 3 + AE, atelte, 
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There are several necessary and sufficient conditions for the values of the a,’s to 
render the equation above an identity for all possible realizations of the {€,} sequence: 


a&=1 

a) =a, Qo [so that a, = a,] 

A =a), + azo [so that a = (a1)? + a] 

3 = AA, + a] [so that a, = (a,)° + 2a,ay] 


Notice that for any value of j > 2, the coefficients solve the second-order difference 
equation a; = a,a;_; + a)@)_7. Since we know ap and aj, we can solve for all the a; 
iteratively. The properties of the coefficients will be precisely those discussed when 
considering homogeneous solutions: 


1. Convergence necessitates that |a,| < 1, a, +a, < 1, and that a, — a, < 1. 
Notice that convergence implies that past values of the {£,} sequence 
ultimately have a successively smaller influence on the current 
value of y,. 


2. If the coefficients converge, convergence will be direct or oscillatory if 
(a; + 4a,) > 0, will follow a sine/cosine pattern if (a? + 4a,) < 0, and will 
“explode” and then converge if (a; + 4a,) = 0. Appropriately setting the a;, 
we are left with the remaining expression: 


b,(1 — a, — ay)? + [b,(1 — a, — a) + 2b,(a, + 2a)]t 
+ [bo(1 — a, — ay) — dy +.4,(b, — by) + 2a,(b, — 2b,)]=0 (1.69) 


Equation (1.69) must equal zero for all values of t. First, consider the case in which 
a, +a, # 1. Since (1 — a, — ay) does not vanish, it is necessary to set the value of b, 
equal to zero. Given that b, = 0 and that the coefficient of ¢ must equal zero, it fol- 
lows that b, must also be set equal to zero. Finally, given that b; = b, = 0, we must 
set by = dg/(1 — a, — a). Instead, if a; + a, = 1, the solutions for the b; depend on 
the specific values of dg, a,, and ay. The key point is that the stability condition for 
the homogeneous equation is precisely the condition for convergence of the particular 
solution. If any characteristic root of the homogeneous equation is equal to unity, a 
polynomial time trend will appear in the particular solution. The order of the polyno- 
mial is the number of unitary characteristic roots. This result generalizes to higher order 
equations. 

If you are really clever, you can combine the discussion of the last section with the 
method of undetermined coefficients. Find the deterministic portion of the particular 
solution using the techniques discussed in the last section. Then use the method of 
undetermined coefficients to find the stochastic portion of the particular solution. In 
(1.67), for example, set £, = €,_; = 0 and obtain the solution aj/(1 — a,). Now use 
the method of undetermined coefficients to find the particular solution of y, = a,y,_; + 
E, + B,€,_,. Add the deterministic and stochastic components to obtain all components 
of the particular solution. 
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A Solved Problem 


To illustrate the methodology using a second-order equation, augment (1.28) with the 
stochastic term £, so that 


y, =3+0.9y,_) — 0.2y 2 + E; (1.70) 


You have already verified that the two homogeneous solutions are A4 (0.5) and A, (0.4) 
and that the deterministic portion of the particular solution is y? = 10. To find the 
stochastic portion of the particular solution, form the challenge solution 


o0 
Y= by OE pj 
i=0 


In contrast to (1.64), the intercept term bọ is excluded (since we have already found 
the deterministic portion of the particular solution) and the time trend b,f is excluded 
(since both characteristic roots are less than unity). For this challenge to work, it must 
satisfy 


AE, + AVE, | + E2 + O3E, 3 tH 


= 0.9fap€,_1 + 1E 2 + OE3 + ORE,_4+-°°] 
— 0.2 [@0E; 2 + @1E3 + A E)_4 + O3E,_5 +°--] +8, (1.71) 


Since (1.71) must hold for all possible realizations of €,, €,_,,€;_2, ... , each of the 
following conditions must hold: 


a&=1 
a, =0.9a 
so that a, = 0.9, and for all i > 2, 
a; = 0.9a;_; — 0.2a;_> (1.72) 


Now, it is possible to solve (1.72) iteratively so that a, = 0.9a, — 0.2a,) = 0.61, a3 = 
0.9(0.61) — 0.2(0.9) = 0.369, and so forth. A more elegant solution method is to view 
(1.72) as a second-order difference equation in the {a@;} sequence with initial conditions 
a@ = l and a, = 0.9. The solution to (1.72) is 


a; = 5(0.5)! — 4(0.4)! (1.73) 


To obtain (1.73), note that the solution to (1.72) is a;= A;(0.5)! + A,(0.4)! 
where A} and A, are arbitrary constants. Imposing the conditions aj = 1 and 
a, = 0.9 yields (1.73). If we use (1.73), it follows that ay = 5(0.5)° — 4(0.4)° = 1; 
a, = 5(0.5)! — 4(0.4)! = 0.9; a, = 5(0.5)* — 4(0.4)? = 0.61; and so on. 

The general solution to (1.70) is the sum of the two homogeneous solutions and 
the deterministic and stochastic portions of the particular solution: 

y, = 10+A,(0.5)' + A, (0.4)! + >, QE; (1.74) 
i=0 

where the a; are given by (1.73). 
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Given initial conditions for yg and y}, it follows that A; and A, must satisfy 


yo = 104A, +A + J ae; (1.75) 
i=0 
yı = 104A,(0.5) + A3(0.4) + Y ae 1; (1.76) 
i=0 


Although the algebra gets messy, (1.75) and (1.76) can be substituted into (1.74) to 
eliminate the arbitrary constants: 


y, = 10 + (0.4 [50 — 10) — 10; — 10)] 


~ 


+ (0.5100; — 10) — 4 Oo — 10] + Ò diEri 


2 
i=0 


9. LAG OPERATORS 


If it is not important to know the actual values of the coefficients appearing in the par- 
ticular solution, it is often more convenient to use lag operators rather than the method 
of undetermined coefficients. The lag operator L is defined to be a linear operator such 
that for any value y, 


Ly = Yi (1.77) 


Thus, L' preceding y, simply means to lag y, by i periods. It is useful to consider 
the following properties of lag operators: 


1. 
2. 


The lag of a constant is a constant: Lc = c. 

The distributive law holds for lag operators. We can set (L! + L/)y, = L'y, + 
Dy, = Yi + Yt-j: 

The associative law of multiplication holds for lag operators. We can set 
Lily, = L(y) = L'y,_; = y,;_;. Similarly, we can set L'L/y, = L'Yy, = 
Yr—i-j: Note that Ly, = yp 

L raised to a negative power is actually a lead operator: L'y, = y,,;. To 
explain, define j = —i and form L/y, = Yj = Yiri 

For |a| < 1, the infinite sum (1 + aL + @ L? + BL +- - y, = y,/A — al). 
This property of lag operators may not seem intuitive, but it follows directly 
from properties 2 and 3 above. 

Proof: Multiply each side by (1 — aL) to form (1 — aL)(1 + aL + a? L? + 
aL? +- - -)y, = y,. Multiply the two expressions to obtain (1 — aL + aL — 
aL +a’? — aL} +---)y, = yp Given that |a| < 1, the expression a"L"y, 
converges to zero as n — oo. Thus, the two sides of the equation are equal. 
For |a| > 1, the infinite sum [1 + (aL)7! + (aL)? + (aL)? +++ Jy, = 
~aLy,/(1 — aL). Thus, y,/(1 — aL) = -(aL)7! Da (aD y, 
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Proof: Multiply by (1 — aL) to form (1 — aL)[1 + (aL)7! + (aL)? + 
(aL)? + - + -]y, = —aLy,. Perform the indicated multiplication to obtain [1 — 
aL + (aL)! — 1 + (aL)? — (aL)~! + (aL? — (aL)? - - -ly, = —aLy,. Given 
that |a| > 1, the expression a“"L~"y, converges to zero as n > oo. Thus, the 
two sides of the equation are equal. 


Lag operators provide a concise notation for writing difference equations. Using 
lag operators, we can write the pth-order equation y, = dg + a)y,_) + +++ + apYi-p + Er 
as 

(=al =a --.. —a,L? yy, = dy +E, 


or, more compactly, as 
A(L)y, = dy + €, 


where A(L) is the polynomial (1 — a; L — Gal? aio aL’) 
Since A(L) can be viewed as a polynomial in the lag operator, the notation A(1) is 
used to denote the sum of the coefficients 


A(1)=1-4a -a eee 


As a second example, lag operators can be used to express the equation y, = dy + 
QV) te Tapp +E, t Presa tee + Beng as 


A(L)y, = dy + B(L)e, 


where A(L) and B(L) are polynomials of orders p and q, respectively. 

It is straightforward to use lag operators to solve linear difference equations. Again 
consider the first-order equation y, = dy + a, y,_; + €, where |a,| < 1. Use the defini- 
tion of L to form 

Yy, = a + a, Ly, +E, (1.78) 


Solving for y,, we obtain 
do + €; 


l-aL 


y= (1.79) 
From property 1, we know that Lay = ap, so that aọ/(1— aL) = ao + ajap + 
aao +: =ag/(1— a). From property 5, we know that £,/(1 -aD = 
E, + aE, 1 + a)"E,_) +- +. Combining these two parts of the solution, we obtain the 
particular solution given by (1.21). 

For practice, we can use lag operators to solve (1.67): y, = dy) + 41y,- +E, + 
B\€,-1, where |a,| < 1. Use property 2 to form (1 — a,L)y, = ag + (1 + f,L)e,. Solving 
for y, yields 

y, = [lao + (1 + 6, De,1/C - aL) 


so that 
= [ay/CU — a,)] + [£;,/(1 — aD] + [AiE 1/ -a D] (1.80) 


Expanding the last two terms of (1.80) yields the same solution found using the method 
of undetermined coefficients. 
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Now suppose y, = dy + 4, y,_; + £, but |a,;| > 1. The application of property 5 to 
(1.79) is inappropriate because it implies that y, is infinite. Instead, expand (1.79) using 
property 6: 


a ae i 
WET aD È abe, (1.81) 
= i=0 
a it 
= — me 
l= TA ) E 
a S 
=? - 1 Oy Era t4i (1.82) 


Lag Operators in Higher Order Systems 


We can also use lag operators to transform the nth-order equation y, = dg + ayy,_, + 
ayra H+ FAY; + E into 


(1-a,L—a,l’ —----a,L") y, = ag + €; 


or 
y, = (ao +€)/A - a,b - aL? —-+--a,L") 


From our previous analysis (also see Appendix 1.2 in the Supplementary Man- 
ual), we know that the stability condition is such that the characteristic roots of the 
equation @” — aja”! — - - - — a„ = 0 all lie within the unit circle. Notice that the val- 
ues of a solving the characteristic equation are the reciprocals of the values of L that 
solve the equation 1 — a,L ---—a,L”" = 0. In fact, the expression 1 — a,L ---—a,L" 
is often called the inverse characteristic equation. Thus, in the literature, it is often 
stated that the stability condition is for the characteristic roots of (1 — a,L ---—a,L") 
to lie outside of the unit circle. 

In principle, one could use lag operators to actually obtain the coefficients of 
the particular solution. To illustrate using the second-order case, consider y, = (dg + 
£,)/(1 — aL — aL’). If we knew the factors of the quadratic equation were such that 
(l-—a,L—- rie = (1 — bi L)(1 — b L), we could write 


Y: = (dy + E) /[1 = BLL = bp) 
If both b; and b, are less than unity in absolute value, we can apply property 5 to obtain 
ay /(1— by) +} bie, ; 
i=0 
a 2 
Reapply the rule to a)/(1 — b,) and to each of the elements in the summation Ebi 


€,_; to obtain the particular solution. If you want to know the actual coefficients of the 
process, it is preferable to use the method of undetermined coefficients. The beauty of 
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lag operators is that they can be used to denote such particular solutions succinctly. The 
general model 


A(L)y, = ag + B(L)e, 
has the particular solution 
Y, = ay /A(L) + BL, /AL) 


As suggested by (1.82), there is a forward-looking solution to any linear dif- 
ference equation. This text will not make much use of the forward-looking solution 
since future realizations of stochastic variables are not directly observable. Some of 
the details of forward-looking solutions can be found in the Supplementary Manual to 
this text available at www.time-series.net and from Wiley. 


10. SUMMARY 


Time-series econometrics is concerned with the estimation of difference equations con- 
taining stochastic components. Originally, time-series models were used for forecast- 
ing. Uncovering the dynamic path of a series improves forecasts because the predictable 
components of the series can be extrapolated into the future. The growing interest in 
economic dynamics has given a new emphasis to time-series econometrics. Stochastic 
difference equations arise quite naturally from dynamic economic models. Appropri- 
ately estimated equations can be used for the interpretation of economic data and for 
hypothesis testing. 

This introductory chapter focused on methods of “solving” stochastic difference 
equations. Although iteration can be useful, it is impractical in many circumstances. 
The solution to a linear difference equation can be divided into two parts: a particu- 
lar solution and a homogeneous solution. One complicating factor is that the homo- 
geneous solution is not unique. The general solution is a linear combination of the 
particular solution and all homogeneous solutions. Imposing n initial conditions on the 
general solution of an nth-order equation yields a unique solution. 

The homogeneous portion of a difference equation is a measure of the disequilib- 
rium in the initial period(s). The homogeneous equation is especially important in that 
it yields the characteristic roots; an nth-order equation has n such characteristic roots. If 
all of the characteristic roots lie within the unit circle, the series will be convergent. As 
you will see in Chapter 2, there is a direct relationship between the stability conditions 
and the issue of whether an economic variable is stationary or nonstationary. 

The method of undetermined coefficients and the use of lag operators are powerful 
tools for obtaining the particular solution. The particular solution will be a linear func- 
tion of the current and past values of the forcing process. In addition, this solution may 
contain an intercept term and a polynomial function of time. Unit roots and character- 
istic roots outside of the unit circle require the imposition of an initial condition for the 
particular solution to be meaningful. Some economic models allow for forward-looking 
solutions; in such circumstances, anticipated future events have consequences for the 
present period. 
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The tools developed in this chapter are aimed at paving the way for the study of 
time-series econometrics. It is a good idea to work all of the exercises presented below. 
Characteristic roots, the method of undetermined coefficients, and lag operators will 
be encountered throughout the remainder of the text. 


QUESTIONS AND EXERCISES 


1. Consider the difference equation y, = dy) + a,y,_, with the initial condition yọ. Jill solved the 
difference equation by iterating backward: 


Y, =% + QY,-4 
= dy + 4, (dy + 4,Y,_>) 
=d)+aa,+ pa,” +-+ aa! +a Yo 


Bill added the homogeneous and particular solutions to obtain y, = a)/(1 — a,) + a;'[yo — 
a/(1 — a,)]. 
a. Show that the two solutions are identical for |a,| < 1. 
b. Show that for a, = 1, Jill’s solution is equivalent to y, = dot + yọ. How would you use 
Bill’s method to arrive at this same conclusion in the case that a, = 1 ? 

2. The cobweb model in Section 5 assumed static price expectations. Consider an alternative 
formulation called adaptive expectations. Let the expected price in t (denoted by p¥) be 
a weighted average of the price in t — 1 and the price expectation of the previous period. 
Formally, 


Dp; = ap,_,+(1— apr, O<a<l 


Clearly, when a = 1, the static and adaptive expectations schemes are equivalent. An inter- 
esting feature of this model is that it can be viewed as a difference equation expressing the 
expected price as a function of its own lagged value and the forcing variable p,_,. 
a. Find the homogeneous solution for př. 
b. Use lag operators to find the particular solution. Check your answer by substituting your 
answer into the original difference equation. 
3. Suppose that the money supply process has the form m, = m+ pm,_, + €,, where m is a 
constant and 0 < p < 1. 
a. Show that it is possible to express m,,,, in terms of the known value m, and the sequence 
{Em1 E2 eo Ennt 
b. Suppose that all values of €,,, for i > 0 have a mean value of zero. Explain how you 
could use your result in part a to forecast the money supply n periods into the future. 
4. The unit root problem in time-series econometrics is concerned with characteristic roots that 
are equal to unity. In order to preview the issue: 


a. Find the homogeneous solution to each of the following: (Hint: Each has at least one unit 


root.) 
iy, = L.5y,_, — 0.5y,_, + £, iii. y, = 2y, — Y2 +, 
ii. VY, = Ym tE, iv. Y, Z Ym t 0.25y,2 ~ 0.25y,_3 +E, 


b. Show that each of the backward-looking solutions is not convergent. 
c. Show that Equation i can be written entirely in first differences; that is, Ay, = 
0.SAy,_, + €,. Find the particular solution for Ay,. 


10. 


11. 
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d. Similarly transform the other equations into their first-difference form. Find the particu- 
lar solution, if it exists, for the transformed equations. 

e. Write equations i through iv using lag operators. 

f. Given an initial condition yọ, find the solution for y, = dy — y,_, + £,- 

a. For each of the following, calculate the characteristic roots and the discriminant d in 
order to describe the adjustment process. 


i. y, = 0.75y,_, — 0.125y,_, iii. y, = 1.8y,_, —O0.8ly,_, 
ii. y, = 1.5y,, —0.75y,_, iv. y, = 1.5y,_, — 0.5625y,_, 


b. Suppose y, = y, = 10. Use a spreadsheet program or a statistical software package to 
calculate and plot the next 25 realizations of the series above. 


. Use the method detailed at the end of Section 1.8 to find the general solutions for 


a. y= 1+ 0.7y,_) = Oly; + E 
b. y, = 1-0.3y,_; + O.1y,_) +€, 


. Consider the stochastic process y, = dy + dyy,_5 + €,- 


a. Find the homogeneous solution and determine the stability condition. 
b. Find the particular solution using the method of undetermined coefficients. 
c. Find the particular solution using lag operators. 


. For each of the following, verify that the posited solution satisfies the difference equation. 


The symbols c, cy, and ay denote constants. 


Equation Solution 

ay,—y,) =0 Mea eC 

b. y, — Y1 = A Y, =C + aot 

ce y, — Y2 =0 y, =et+c,(-1)! 

d. y,- Y2 = €, J= CO+ Co (-1)! tet Eng + Eg + ese 


. Part 1: For each of the following, determine whether {y,} represents a stable process. 


Determine whether the characteristic roots are real or imaginary and whether the real parts 
are positive or negative. 
e y= 1.2y,_, + 0.2y,_, 
e y, — 1.2y,_, +0.4y,_, 
ee ca 1.2y,_ ~ 1.2y,_, 
-Yı F 1.2y t-1 
< y, — 0.7y,_, — 0.25y,_, + 0.175y,_, = 0 
[Hint: (x — 0.5)(x + 0.5)(x — 0.7) = xX? — 0.7x? — 0.25x + 0.175.] 
Part 2: Write each of the above equations using lag operators. Determine the characteris- 
tic roots of the inverse characteristic equation. 


can oe pf 


Consider the stochastic difference equation 
y, = 0.8y,_, +E, — 0.56, 


a. Suppose that the initial conditions are such that yọ = 0 and €y = €_, = 0. Now suppose 
that €, = 1. Determine the values y, through y, by forward iteration. 

b. Find the homogeneous and particular solutions. 

c. Impose the initial conditions in order to obtain the general solution. 

d. Trace out the time path of an £, shock on the entire time path of the {y, } sequence. 

Use equation (1.5) to determine the restrictions on @ and J necessary to ensure that the {y,} 

process is stable. 
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12. Consider the following two stochastic difference equations 
i. y, =3+4+0.75y,_, — 0.125y,_, + £, ii. y, = 3 + 0.25y,_; + 0.375y,_, + £, 


a. Use the method of undetermined coefficients to find the particular solution for each 
equation. 

b. Find the homogeneous solutions for each equation. 

c. For each process, suppose that yọ = y, = 8 and that all values of £, for 
t= 1,0,-1,-2, ... = 0. Use the method illustrated by equations (1.75) and 
(1.76) to find the values of the constants A, and A,. 

13. Although it is not the simplest solution method, it is possible to use the method of 

undetermined coefficients when you are given initial conditions. Consider the model 

y, = 0.75y,_, + £, where yọ is given. From equations (1.18) and (1.66) you know that the 

solution for y, has the form y, = €, + @€,_; + @,€,_) + @3€,_3 +++++,_,€, + a Yo where 

the a, are the undetermined coefficients. 

a. Show that the solution for y,_, has the form y,_; = €,_, + @€,_) + Q)€,_3 + A3€,_4 + 
-+ GQ, 5€) + ae Yes 

b. Substitute the challenge solutions for y, and y,_, into y, = 0.75y,_, + £, to find the values 
of the q;. 

c. How would you use the method of undetermined coefficients to solve the second-order 
process y, = 0.75y,_, — 0.125y,_, + £, where y, and y, are given? 


CHAPTER 2 


STATIONARY TIME-SERIES 
MODELS 


Learning Objectives 
1. Describe the nature of stochastic linear difference equations. 
2. Develop the tools used in estimating ARMA models. 
3. Consider the time-series properties of stationary and nonstationary models. 
4 


Consider various test statistics to check for model adequacy. Several 
examples of estimated ARMA models are analyzed in detail. It is shown 
that how a properly estimated model can be used for forecasting. 


ja 


Derive the theoretical autocorrelation function for various ARMA processes. 


6. Derive the theoretical partial autocorrelation function for various ARMA 
processes. 


7. Show how the Box—Jenkins methodology relies on the autocorrelations and 
partial autocorrelations in model selection. 


8. Develop the complete set of tools for Box—Jenkins model selection. 
9. Examine the properties of time-series forecasts. 


10. Illustrate the Box—Jenkins methodology using a model of the term structure 
of interest rates. 


11. Show how to model series containing seasonal factors. 
12. Develop diagnostic testing for model adequacy. 


13. Show that combined forecasts typically outperform forecasts from a single 
model. 


1. STOCHASTIC DIFFERENCE EQUATION 
MODELS 


In this chapter, we continue to work with discrete, rather than continuous, time-series 
models. Recall from the discussion in Chapter 1 that we can evaluate the function y = 
f£) at tọ and tọ + h to form 


Ay = f(t + h) — fo) 


As a practical matter, most economic time-series data are collected for discrete 
time periods. Thus, we consider only the equidistant intervals fo, fg + h, tọ + 2h, to + 
3h, ... and conveniently seth = 1. Be careful to recognize, however, that a discrete time 
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series implies that ¢, but not necessarily y,, is discrete. For example, although Scotland’s 
annual rainfall is a continuous variable, the sequence of such annual rainfall totals for 
years | through f¢ is a discrete time series. In many economic applications, t refers to 
“time” so that h represents the change in time. However, t need not refer to the type of 
time interval as measured by a clock or calendar. Instead of allowing our measurement 
units to be minutes, days, quarters, or years, t can refer to an ordered event number. We 
could let y, denote the outcome of spin f on a roulette wheel; y, can then take on any of 
the 38 values 00, 0, 1,..., 36. 

A discrete variable y is said to be a random variable (i.e., stochastic) if, for any 
real number r, there exists a probability p(y < r) that y will take on a value less than 
or equal to r. This definition is fairly general; in common usage, it is typically implied 
that there is at least one value of r for which 0 < p(y = r) < 1. If there is some r for 
which p(y = r) = 1, y is deterministic rather than random. 

It is useful to consider the elements of an observed time series {yp, Y1, Y2, --- .);} as 
being realizations (i.e., outcomes) of a stochastic process. As in Chapter 1, we continue 
to let the notation y, to refer to an element of the entire sequence {y,}. In our roulette 
example, y, denotes the outcome of spin ¢ on a roulette wheel. If we observe spins 1 
through T, we can form the sequence y}, y>,..., Or, more compactly, {y,}. In the 
same way, the term y, could be used to denote gross domestic product (GDP) in time 
period t. Since we cannot forecast GDP perfectly, y, is a random variable. Once we learn 
the value of GDP in period t, y, becomes one of the realized values from a stochastic 
process. (Of course, measurement error may prevent us from ever knowing the “true” 
value of GDP.) 

For discrete variables, the probability distribution of y, is given by a formula (or 
table) that specifies each possible realized value of y, and the probability associated 
with that realization. If the realizations are linked across time, there exists the joint 
probability distribution p(y, = r1, Y2 = 1, ..-, Yr = rr) where r; is the realized value 
of y in period i. Having observed the first t realizations, we can form the expected value 
Of Y,4.1+);425 ++», conditioned on the observed values of y, through y,. This conditional 
mean, or expected value, of y,,; is denoted by E,[y,4;|¥5Y,-1> «++ >] OF E Yri 

Of course, if y, refers to the outcome of spinning a fair roulette wheel, the probabil- 
ity distribution is easily characterized. In contrast, we may never be able to completely 
describe the probability distribution for GDP. Nevertheless, the task of economic theo- 
rists is to develop models that capture the essence of the true data-generating process. 
Stochastic difference equations are one convenient way of modeling dynamic economic 
processes. To take a simple example, suppose that the Federal Reserve’s money supply 
target grows 3% each year. Hence, 


m; = 1.03m“ (2.1) 
so that, given the initial condition mẹ, the particular solution is 
my = (1.03) mš 


where =m; =the money supply target in year t; 


mo = the initial condition for the target money supply in period zero. 
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Of course, the actual money supply (m,) and the target need not be equal. Suppose 
that, at the end of period ¢ — 1, there exists m,_; outstanding dollars that are carried 
forward into period t. Hence, at the beginning of t, there are m,_, dollars so that the 
gap between the target and the actual money supply is m* — m,_,. Suppose that the Fed 
cannot perfectly control the money supply but attempts to change the money supply by 
p percentage (p < 100%) of any gap between the desired and actual money supply. We 
can model this behavior as 


Am, = pilm} — m,_,] + €, 
or using (2.1), we obtain 
m, = p(1.03)'mp + (1 — p)m,_; +€, (2.2) 


where £, is the uncontrollable portion of the money supply. 

We assume that the mean of €, is zero in all time periods. 

Although the economic theory is overly simple, the model does illustrate the key 
points discussed earlier. Note the following: 


1. Although the money supply is a continuous variable, (2.2) is a discrete differ- 
ence equation. Since the forcing process {€,} is stochastic, the money supply 
is stochastic; we can call (2.2) a linear stochastic difference equation. 

2. If we knew the distribution of {€,}, we could calculate the distribution for 
each element in the {m,} sequence. Since (2.2) shows how the realizations 
of the {m,} sequence are linked across time, we would be able to calculate 
the various joint probabilities. Notice that the distribution of the money sup- 
ply sequence is completely determined by the parameters of the difference 
equation (2.2) and the distribution of the {€,} sequence. 

3. Having observed the first ż observations in the {m,} sequence, we can 
make forecasts of m,), M42, -.. . For example, updating (2.2) by one 
period and taking the conditional expectation, the forecast of m,,, is 
Ema, = p(1.03)'*! me + (1 -— p)m,. 


Before we proceed too far along these lines, let us go back to the basic building 
block of discrete stochastic time-series models: the white-noise process. A sequence 
{e€,} is a white-noise process if each value in the sequence has a mean of zero, a constant 
variance, and is uncorrelated with all other realizations. Formally, if the notation E(x) 
denotes the theoretical mean value of x, the sequence {€,} is a white-noise process if 
for each time period t 


Ele) = E(€,-1) =- =0 
E(e?) = Ble, 4) Sno" [or var(e,) = var(é,_) = = o°] 
E(E, Ers) = EE; €:-;-s) 
= 0 for all j and s [or cov(é,, E) = cov(é,_js ia) = 0] 


In the remainder of this text, {¢,} will always refer to a white-noise process and 
o? will refer to the variance of that process. When it is necessary to refer to two or 
more white-noise processes, symbols such as {€,,} and {€,} will be used. Now, use 
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a white-noise process to construct the more interesting time series 


q 
x, = Die (2.3) 

i=0 
For each period ż, x, is constructed by taking the values €,, €,_,... €;-, and mul- 


tiplying each by the associated value of #;. A sequence formed in this manner is called 
a moving average of order q and is denoted by MA(q). To illustrate a typical moving 
average process, suppose you win $1 if a fair coin shows a head and lose $1 if it shows 
a tail. Denote the outcome on toss t by €, (i.e., for toss t, €, is either +$1 or —$1). If 
you want to keep track of your hot streaks, you might want to calculate your average 
winnings on the last four tosses. For each coin toss t, your average payoff on the last 
four tosses is 1/4e, + 1/4e,_, + 1/4€,_) + 1/4e,_3. In terms of (2.3), this sequence is 
a moving average process such that J; = 0.25 for i < 3 and zero otherwise. 

Although the {€,} sequence is a white-noise process, the constructed {x,} sequence 
will not be a white-noise process if two or more of the p; differ from zero. To illustrate 
using an MA(1) process, set fy = 1, 6, = 0.5, and all other J; = 0. In this circumstance, 
E(x,) = E(e, + 0.5€,_,) = 0 and var(x,) = var(e, + 0.5€,_) = 1.2507. You can easily 
convince yourself that E(x,) = E(x,_,) and that var(x,) = var(x,_,) for all s. Hence, 
the first two conditions for {x,} to be a white-noise process are satisfied. However, 
E(x,x,_-1) = Ele, + 0.5€,_, )(€,_1 + 0.5€,_»)] = Ele,€,_, + 0.5(€,_1)* + 0.5€,€,_9 + 
0.25€,_1€,-) = 0.507. Given that there exists a value of s # 0 such that E(x,x,_,) # 0, 
the {x,} sequence is not a white-noise process. 

Exercise | at the end of this chapter asks you to find the mean, variance, and 
covariance of your hot streaks in coin tossing. For practice, you should work that 
exercise before continuing. If you are a bit “rusty” on the algebra of finding means, 
variances and covariances, you should also work through Exercises 2 and 3 and consult 
Section 2.3 of the Supplementary Manual to this text. 


2. ARMA MODELS 


It is possible to combine a moving average process with a linear difference equation 
to obtain an autoregressive moving average (ARMA) model. Consider the pth order 
difference equation 


p 
Yı = Ag + > UY -i +X (2.4) 
i=l 
Now let {x,} be the MA(q) process given by (2.3) so that we can write 


P q 
Y= a+ Vayiit Y Bei (2.5) 
i=1 i=0 


We follow the convention of normalizing units so that Jọ is always equal to unity. 
If the characteristic roots of (2.5) are all in the unit circle, {y,} is called an ARMA 
model for y,. The autoregressive part of the model is the difference equation given by 
the homogeneous portion of (2.4) and the moving average part is the {x,} sequence. 
If the homogeneous part of the difference equation contains p lags and the model for 
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x, contains q lags, the model is called an ARMA(p, q) model. If q = 0, the process is 
called a pure autoregressive process denoted by AR(p), and if p = 0, the process is a 
pure moving average process denoted by MA(q). In an ARMA model, it is perfectly 
permissible to allow p and/or q to be infinite. In this chapter, we consider only models 
in which all of the characteristic roots of (2.5) are within the unit circle. However, if one 
or more characteristic roots of (2.5) is greater than or equal to unity, the {y,} sequence is 
said to be an integrated process and (2.5) is called an autoregressive integrated moving 
average (ARIMA) model. 

Treating (2.5) as a difference equation suggests that we can “solve” for y, in terms 
of the {€,} sequence. The solution of an ARMA(p, q) model expressing y, in terms of 
the {€,} sequence is the moving average representation of y,. The procedure is no 
different from that discussed in Chapter 1. For the AR(1) model y, = ag + ay y,_1 + E; 
the moving average representation was shown to be 


[oe] 
Y, = a9/(1 — ay) + > d Ei 
i=0 


For the general ARMA (p, q) model, rewrite (2.5) using lag operators so that 


(: = 5 ow) Y, = ay + ee 
i=0 


i=] 


so that the particular solution for y, is 


y= (« + 5 sen (i - 5 ot) (2.6) 
i=0 i=1 


Fortunately, it will not be necessary for us to expand (2.6) to obtain the specific 
coefficient for each element in {€,}. The important point to recognize is that the expan- 
sion will yield an MA(co) process. The issue is whether such an expansion is convergent 
so that the stochastic difference equation given by (2.6) is stable. As you will see in 
Section 3, the stability condition is that the roots of the polynomial (1 — Xa;Li ) must 
lie outside the unit circle. It is also shown that if y, is a linear stochastic difference 
equation, the stability condition is a necessary condition for the time series {y,} to be 
Stationary. 


3. STATIONARITY 


Suppose that the quality control division of a manufacturing firm samples four 
machines each hour. Every hour, quality control finds the mean of the machines’ 
output levels. The plot of each machine’s hourly output is shown in Figure 2.1. If y,, 
represents machine y,’s output at hour f, the means (y,) are readily calculated as 


4 
y= >, Viel 4 
i=1 


For hours 5, 10, and 15, these mean values are 4.61, 5.14, and 5.03, respectively. 
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FIGURE 2.1 Hourly Output of Four Machines 


The sample variance for each hour can similarly be constructed. Unfortunately, 
applied econometricians do not usually have the luxury of being able to obtain an 
ensemble (i.e., multiple time-series data of the same process over the same time 
period). Typically, we observe only one set of realizations for any particular series. 
Fortunately, if {y,} is a stationary series, the mean, variance, and autocorrelations can 
usually be well approximated by sufficiently long time averages based on the single 
set of realizations. Suppose that you observed only the output of machine 1 for 20 
periods. If you knew that the output was stationary, you could approximate the mean 
level of output by 


20 
J = $, yu/20 
t=1 


In using this approximation, you would be assuming that the mean was the same 
for each period. Formally, a stochastic process having a finite mean and variance is 
covariance stationary if for all t and t— s, 


EQ») = EQ,-s) =H (2.7) 
El, - 41 = Els — HY] = o$ [var(y,) = var(y,_,)=o5] (2.8) 
ELQ; — WO — A) = ELO, = HOr-j-s — MH) = Ys 
[cov(y,, Yrs) = COV, Y-j-s) = Ys] 2.9) 
where pn, o7, and y, are all constants. 
In (2.9), allowing s = 0 means that yp is equivalent to the variance of y,. Simply put, 
a time series is covariance stationary if its mean and all autocovariances are unaffected 
by a change of time origin. In the literature, a covariance-stationary process is also 


referred to as a weakly stationary, second-order stationary, or wide-sense stationary 
process. (Note that a strongly stationary process need not have a finite mean and/or 
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variance.) The text considers only covariance-stationary series so that there is no ambi- 
guity in using the terms stationary and covariance stationary interchangeably. One 
further word about terminology: In multivariate models, the term autocovariance is 
reserved for the covariance between y, and its own lags. Cross-covariance refers to the 
covariance between one series and another. In univariate time-series models, there is 
no ambiguity, and the terms autocovariance and covariance are used interchangeably. 

For a covariance-stationary series, we can define the autocorrelation between y, 
and y,_, as 

Ps = Ys/Y0 


where yọ and y, are defined by (2.9). 

Since y, and yọ are time independent, the autocorrelation coefficients p, are also 
time independent. Although the autocorrelation between y, and y,_, can differ from the 
autocorrelation between y, and y,_5, the autocorrelation between y, and y,_, must be 
identical to that between y,_, and y,_,_;. Obviously, pọ = 1. 


Stationarity Restrictions for an AR(1) Process 


For expositional convenience, consider the necessary and sufficient conditions for an 
AR(1) process to be stationary. Let 


Y, = Ay + a Y,1 +E, 


where £, = white noise. 

Suppose that the process started in period zero, so that yọ is a deterministic initial 
condition. In Section 3 of Chapter 1, it was shown that the solution to this equation is 
(see also Question 4 at the end of this chapter) 


i=l 1-1 
y; =a), a, + a yo + Dds (2.10) 
i=0 i=0 


Taking the expected value of (2.10), we obtain 
t-1 
Ey, = ay), a), + a! yo (2.11) 
i=0 
Updating by s periods yields 
t+s—-1 
Eys = ao $, a + alt yo (2.12) 
i=0 


Comparing (2.11) and (2.12), it is clear that both means are time dependent. Since 
Ey, is not equal to Ey,,,, the sequence cannot be stationary. However, if f is large, 
we can consider the limiting value of y, in (2.10). If |a,| < 1, the expression (a) yo 
converges to zero as t becomes infinitely large and the sum ap[1 + a, + (a1)? + (a,)? + 
---] converges to aọ/(1 — a). Thus, as t > œ and if |a,| < 1 


a = 3 
limy, = oe + Pale, (2.13) 
1 i=0 
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Now take expectation of (2.13) so that, for sufficiently large values of t, Ey, = 
do/(1 — a,). Thus, the mean value of y, is finite and time independent so that Ey, = 
Ey, = 49/(. — a;) = n for all t. Turning to the variance, we find 


EQ, — W? = Ele, + 6,1 + (41) E2 +7] 
= o-[1 + CAE + (a,)4 +-]= o/( Z (a,)) 


which is also finite and time independent. Finally, it is easily demonstrated that the 
limiting values of all autocovariances are finite and time independent: 


El, — MOs — HI = Elle; + aE + (ay) Eat] 
[Es + QyE;-5-1 + (ai) Ersan +---]} 
=o(a,) [1 + (a)? + (af +°*] 
=0°(a,)°/[1 - (a) (2.14) 


In summary, if we can use the limiting value of (2.10), the {y,} sequence will be 
stationary. For any given yg and |a,| < 1, it follows that ¢ must be sufficiently large. 
Thus, if a sample is generated by a process that has recently begun, the realizations 
may not be stationary. It is for this very reason that many econometricians assume that 
the data-generating process has been occurring for an infinitely long time. In prac- 
tice, the researcher must be wary of any data generated from a “new” process. For 
example, {y,} could represent the daily change in the dollar/mark exchange rate begin- 
ning immediately after the demise of the Bretton Woods fixed exchange rate system. 
Such a series may not be stationary due to that fact there were deterministic initial con- 
ditions (exchange rate changes were essentially zero in the Bretton Woods era). The 
careful researcher wishing to use stationary series might consider excluding some of 
these earlier observations from the period of analysis. 

Little would change were we not given the initial condition. Without the initial 
value yp, the sum of the homogeneous and particular solutions for y, is 


Y, = a/(1 — a,) + È, di €; + Ala, ) (2.15) 
i=0 


where A is an arbitrary constant. 

If you take the expectation of (2.15), it is clear that the {y,} sequence cannot be 
stationary unless the expression A(a,)’ is equal to zero. Either the sequence must have 
started infinitely long ago (so that a,‘ = 0) or the arbitrary constant A must be zero. 
Recall that the arbitrary constant is interpreted as a deviation from long-run equilib- 
rium. The stability conditions can be stated succinctly: 


1. The homogeneous solution must be zero. Either the sequence must have 
started infinitely far in the past or the process must always be in equilibrium 
(so that the arbitrary constant is zero). 


2. The characteristic root a, must be less than unity in absolute value. 


These two conditions readily generalize to all ARMA(p, q) processes. We know 
that the homogeneous solution to (2.5) has the form 
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or, if there are m repeated roots, 


m p 
a>) At’ + 2 Aja 
i=1 i=m+1 


where A; are all arbitrary constants, «æ is the repeated root, and aq; are the distinct roots. 

If any portion of the homogeneous equation is present, the mean, variance, and all 
covariances will be time dependent. Hence, for any ARMA(p, q) model, stationarity 
necessitates that the homogeneous solution be zero. Section 4 addresses the stationarity 
restrictions for the particular solution. 


4. STATIONARITY RESTRICTIONS FOR AN 
ARMA (p, q) MODEL 


As a prelude to the stationarity conditions for the general ARMA(p, g) model, consider 
the restrictions necessary to ensure that an ARMA(2, 1) model is stationary. Since the 
magnitude of the intercept term does not affect the stability (or stationarity) conditions, 
set dg = 0 and write 


Yi = UY 1 + AYr-2 + Ey + Pi Er (2.16) 
From the previous section, we know that the homogeneous solution must be zero. 


As such, it is only necessary to find the particular solution. Using the method of unde- 
termined coefficients, we can write the challenge solution as 


foe) 


Y= Dei Cin 


i=0 
For (2.17) to be a solution of (2.16), the various c; must satisfy 
Coy + C1E1 FCE HCE He 


= Ay (CoE; + C1E2 + CE3 + C3E 4 + ° °°) 


+ Ay(CoEp_2 + C1Ep3 + CE4 + C3E;_5 Hee) HE, + BLE} 


To match coefficients on the terms containing €,,€,_),€;_2,.-., it is necessary 
to set 

1. c=l 

2. cy = aco + By >c =q +f 

3. ci = Ci + aCi for alli > 2 


The key point is that, for i > 2, the coefficients satisfy the difference equation 
Ci = 4,C;_1 + 47¢;_>. If the characteristic roots of (2.16) are within the unit circle, the 
{c;} must constitute a convergent sequence. For example, reconsider the case in which 
a, = 1.6 and a, = —0.9, and let p4 = 0.5. Worksheet 2.1 shows that the coefficients sat- 
isfying (2.17) are 1, 2.1, 2.46, 2.046, 1.06, —0.146, and so on (also see Worksheet 1.2 
of the previous chapter). 


CHAPTER 2 STATIONARY TIME-SERIES MODELS 


WORKSHEET 2, 7 


COEFFICIENTS OF THE ARMA(2, 1) PROCESS: 
y, = 1.oy,_, — 0.9y,_5 + €, + 0.5€,_; 


If we use the method of undetermined coefficients, the c, must satisfy 


Co = 1 
c, =1.16+0.5 hence c, = 2.1 
c, = 1.6c,_, — 0.9¢,_, for all i = 2,3,4,... 


Notice that the coefficients follow a second-order difference equation with imaginary 
roots. If we use de Moivre’s Theorem, the coefficients will satisfy 


c; = 0.949! - B,cos(0.567i + B,) 
Imposing the initial conditions for c} and c, yields 
1=f,cos(f,) and 2.1 = 0.9498, cos(0.567 + f,) 
Since f} = 1/cos(f,), we seek the solution to 
cos(f,) — (0.949/2.1) cos(0.567 + f,) = 0 


You can use a calculator or a trig table to verify that the solution for p, is —1.197 
and the solution for f, is 2.739. Hence, the c, must satisfy 


(2.739) « 0.949! - cos(0.567i — 1.197) 


Alternatively, we can use the initial values of cy and ç} to find the other c; by iteration. 
The sequence of the c, is shown in the graph. 


You can use a spreadsheet to verify that the values of cy through cj, are 


i 0 1 2 3 4 3 6 7 8 9 10 
c, 1.00 2.10 2.46 2.046 1.06 —0.146 1.187 -1.786 —1.761 —1.226 —0.378 


i 
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To verify that the {y,} sequence generated by (2.17) is stationary, take the expecta- 
tion of (2.17) to form Ey, = Ey,_; = 0 for all t and i. Hence, the mean is finite and time 
invariant. Since the {€,} sequence is assumed to be a white-noise process, the variance 
of y, is constant and time independent; that is, 


2 
var(y,) = El(co€; + CE;_-1 + Cr€p-2 + C3E3 +] 
wo 
ya 
i=0 


Hence, var(y,) = var(y,_,) for all ż and s. Finally, the covariance between y, and y,_, is 


COV(Y,, Y1) = El(E; + C1E1 + CE2 He NE HCE + CE3 + CE o 


=0°(c] +c] +e +-+) 


COV(Y,, ¥p-9) = ELE, + Cy Ep_y + CoE pg +: Ea + Cy Ep_3 HCE + CEs t:e) 


7 
= 0° (C) + C3C] + C4C2 +: °°) 


so that 
COV(Y;,, Yis) = mn + Cy 44 Cy + Coy nly + °°) (2.18) 


Thus, cov(y,, y;_,) is constant and independent of t. Conversely, if the characteristic 
roots of (2.16) do not lie within the unit circle, the {c;} sequence will not be convergent. 
As such, the {y,} sequence cannot be convergent. 

It is not too difficult to generalize these results to the entire class of ARMA (p, q) 
models. Begin by considering the conditions ensuring the stationarity of a pure MA(co) 
process. By appropriately restricting the f;, all of the finite-order MA(q) processes can 
be obtained as special cases. Consider 


foe} 
X= 2. BiEs-i 
i=0 
2 


where {£,} = a white-noise process with variance o^. 

We have already determined that {x,} is not a white-noise process; now, the issue 
is whether {x,} is covariance stationary. Given conditions (2.7), (2.8), and (2.9), we ask 
the following: 


1. Is the mean finite and time independent? Take the expected value of x, and 
remember that the expectation of a sum is the sum of the individual expecta- 
tions. Therefore, 


E(x,) = EE, + PiE + Po€:-2 + + *) 
= Ee, + Pp, Ee,_, + pyE€)_> +- =0 
Repeat the procedure with x 


t—s* 


E(x,_5) = Ele, st Bie, s i+ foe, s 2+7) =0 


Hence, all elements in the {x,} sequence have the same finite mean (u = 0). 


58 CHAPTER2 STATIONARY TIME-SERIES MODELS 


2. Is the variance finite and time independent? Form var(x,) as 
var(x,) = El(e, + Bye,-1 + Bo€;-2 +] 


Square the term in parentheses and take expectations. Since {€,} is a 
white-noise process, all terms Fe,e,_, = 0 for s # 0. Hence, 


var(x,) = Ele)? + (By) EE) + (Ba) EE) ++ 
=o [1 + (P) + (By) +>] 
As long as È (f,)" is finite, it follows that var(x,) is finite. Thus, X (£)? being 
finite is a necessary condition for {x,} to be stationary. To determine whether 
var(x,) = var(x,_,), form 
var(x;_;) = E[(e, st Bie; s-1 + Poe, sa ti 7] 
=o0°[1 + (p)? + (By) + °°] 


Thus, var(x,) = var(x,_,) for all ż and t — s. 


3. Are all autocovariances finite and time independent? First, form E(x,x,_,) as 
E[x,X,-5] = Elle, + By€p1 + BoEp-2 +++ VE ras + Br Ers1 + Bo€s-s-2 + °°) 
Carrying out the multiplication and noting that E(e,e,_,) = 0 for s # 0, we get 


E Xs) = 0° (By + Piber + Boba +++) 


Restricting the sum J, + 6; Ps41 + BoB,42 +--+: to be finite means that E(x,x,_,) is 
finite. Given this second restriction, it is clear that the covariance between x, and x,_, 
only depends on the number of periods separating the variables (i.e., the value of s) but 
not on the time subscript t. 

In summary, the necessary and sufficient conditions for any MA process to be 
covariance stationary are for the sums Ÿ (f;)? and (B, + Bi By41 + BrByao ++ ++) to be 
finite. For an infinite-order process, these conditions must hold for all s > 0. Some of 
the details involved with maximum likelihood estimation of MA processes are dis- 
cussed in Appendix 2.1. 


Stationarity Restrictions for the Autoregressive 
Coefficients 


Now consider the pure autoregressive model 


Y; = ao + diYt—i + Er (2.19) 


p 


i=l 


If the characteristic roots of the homogeneous equation of (2.19) all lie inside the 
unit circle, it is possible to write the particular solution as 


y= ali = 5 a + 5 CiE pi (2.20) 


where c; = undetermined coefficients. 
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Although it is possible to find the undetermined coefficients {c;}, we know that 
(2.20) is a convergent sequence so long as the characteristic roots of (2.19) are inside 
the unit circle. To sketch the proof, the method of undetermined coefficients allows us 
to write the particular solution in the form of (2.20). We also know that the sequence 
{c;} will eventually solve the difference equation 


Ci — A,C;_1 — QC;_9 — +++ — a,c;_, = 0 (2.21) 


If the characteristic roots of (2.21) are all inside the unit circle, the {c;} sequence 
will be convergent. Although (2.20) is an infinite-order moving average process, the 
convergence of the MA coefficients implies that Xe? is finite. Thus, we can use (2.20) 
to check the three conditions for stationarity. From (2.20), 


Ey, = Eys = a/(1 = >, ai) 


You should recall from Chapter 1 that a necessary condition for all characteristic 
roots to lie inside the unit circle is 1 — Xa; > 0. Hence, the mean of the sequence is 
finite and time invariant. 


Var(y,) = El(e, + C1E1 + CE2 + C3Ep_3 + YP) = o? Oy e 
and 


2 25 2 
Varr) = EllEs + C1Ers-1 + C2Ers-2 + C3Ep-5-3 $Y = 0 by 


Given that > č is finite, the variance is finite and time independent. 
Cov(y,, Yis) = ELE, + C1E1 + CoE p-2 $+ Ers + C1Ets-1 + CEs- HO 
2 
SOC + C1Cs41 Flee gg +t) 
Thus, the covariance between y, and y,_, is constant and time invariant for all ¢ and 


t — s. Nothing of substance is changed by combining the AR(p) and MA(q) models into 
the general ARMA(p, g) model 


P 
Y, =o + aiYti + X; 
=I 
q 
n= pe 2.22) 
=0 


If the roots of the inverse characteristic equation lie outside the unit circle [i.e., if 
the roots of the homogeneous form of (2.22) lie inside the unit circle] and if the {x,} 
sequence is stationary, the {y,} sequence will be stationary. Consider 
ag Er PiE PE 1-2 


y= —S— + — t es a ES (2.23) 


1- J,a; 1- J aL 1- J al 1- al 


i=1 i=l i=l i=l 


With very little effort, you can convince yourself that the {y,} sequence satisfies the 
three conditions for stationarity. Each of the expressions on the right-hand side of (2.23) 
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is stationary as long as the roots of 1 — Za,L' are outside the unit circle. Given that {x,} 
is stationary, only the roots of the autoregressive portion of (2.22) determine whether 
the {y,} sequence is stationary. 


5. THE AUTOCORRELATION FUNCTION 


The autocovariances and autocorrelations of the type found in (2.18) serve as useful 
tools in the Box—Jenkins (1976) approach to identifying and estimating time-series 
models. We illustrate by considering four important examples: the AR(1), AR(2), 
MA(1), and ARMA(1, 1) models. For the AR(1) model, y, = ag + a, y,_) + €;, (2.14) 
shows 


Yo =0°/L - (a,)7] 
y, =07(a,)°/[1 — (a,)"] 


Forming the autocorrelations by dividing each y, by yọ, we find that pọ = 1, 
pi =, P = (a4), ..., Py = (a,)°. For an AR(1) process, a necessary condition for 
stationarity is for |a,| < 1. Thus, the plot of p, against s—called the autocorrelation 
function (ACF) or correlogram—should converge to zero geometrically if the 
series is stationary. If a, is positive, convergence will be direct, and if a, is negative, 
the autocorrelations will follow a dampened oscillatory path around zero. The first 
two graphs on the left-hand side of Figure 2.2 show the theoretical autocorrelation 
functions for a, = 0.7 and a, = —0.7, respectively. Here, pọ is not shown since its 
value is necessarily unity. 


The Autocorrelation Function of an AR(2) Process 


Now consider the more complicated AR(2) process y, = a,y,_; + dy, + €;. We omit 
an intercept term (aj) since it has no effect on the ACF. For the second-order process to 
be stationary, we know that it is necessary to restrict the roots of (1 — a} L — aL’) to be 
outside the unit circle. In Section 4, we derived the autocovariances of an ARMA(2, 1) 
process by use of the method of undetermined coefficients. Now, we want to illustrate 
an alternative technique using the Yule— Walker equations. Multiply the second-order 
difference equation by y,_, for s = 0, s = 1, s = 2,... and take expectations to form 


Eyy; = a, Ey,_1y; + QEY;_2y, + EEY; 
Ey Yi-1 = QU EY,_1Y;-1 + QQ9EY;-2Y,-1 + FEY 1 
EY Yi-2 = Q EY,_1Y;-2 + Qn9EY;-2Y;-2 + EE :Yy-2 


Ey Yp-5 = A Ey; Yj~5 + A2EY;_2) 1-5 + EE-s (2.24) 


By definition, the autocovariances of a stationary series are such that Fy,y,_, = 
Ey, Y, = EY -kYt-k-s = Ys. We also know that Ee,y, = o? and Ee,y,_, = 0. Hence, we 
can use the equations in (2.24) to form 


Yo = ayı +a +07 (2.25) 
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FIGURE 2.2 Theoretical ACF and PACF Patterns 


Yı = 41o + 22% (2.26) 

Ys = a1Ys-1 + 427s-2 (2.27) 
Dividing (2.26) and (2.27) by yọ yields 

Pı = Py + arp, (2.28) 

Ps = 4Ps_1 + A2Ps_2 (2.29) 


We know that pọ = 1, so that from (2.28), p} = a, /(1 — ay). Hence, we can find 
all p, for s > 2 by solving the difference equation (2.29). For example, for s = 2 
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and s = 3, 


p =(a,)"/(1 — ay) + ay 
P3 =a [a )?/0 — a) + ay] +aa;/(1 — ay) 


Although the values of the p, are cumbersome to derive, we can easily characterize 
their properties. Given the solutions for pọ and p4, the key point to note is that the p, 
all satisfy the difference equation (2.29). As in the case of a second-order difference 
equation, the solution may be oscillatory or direct. Note that the stationarity condition 
for y, necessitates that the characteristic roots of (2.29) lie inside the unit circle. Hence, 
the {,} sequence must be convergent. The correlogram for an AR(2) process must be 
such that pọ = 1 and p, be determined by (2.28). These two values can be viewed as 
initial values for the second-order difference equation (2.29). 

The fourth panel on the left-hand side of Figure 2.2 shows the ACF for the process 
y; = 0.7y,_; — 0.49y,_» + €,. The properties of the various p, follow directly from the 
homogeneous equation y, — 0.7y,_; + 0.49y,_ = 0. The roots are obtained from the 
solution to 

a = {0.7 + [(-0.7)* — 4(0.49)]'/7} /2 


Since the discriminant d = (—0.7)* — 4(0.49) is negative, the characteristic roots 
are imaginary so that the solution oscillates. However, since a, = —0.49, the solution 
is convergent and the {y,} sequence is stationary. 

Finally, we may wish to find the autocovariances rather than the autocorrelations. 
Since we know all of the autocorrelations, if we can find the variance of y, (i.e., yọ), we 
can find all of the other y,. To find yọ, use (2.25) and note that p; = y;/7o so that 

Yo(l — ap; — 4/2) = o? 
Substitution for pı and p, yields 
O) = (1 - a)/ + a) ( a ) 
Yo = var(y,) = [(1 — a +a c 
2 f g 7 Na +a = IXa- a; = 1) 
The Autocorrelation Function of an MA(1) Process 


Next consider the MA(1) process y, = £, + Be,_,. Again, we can obtain the Yule— 
Walker equations by multiplying y, by each y,_, and take expectations 


Yo = var(y,) = Ey,y, = El(e, + pE,1X(E;, + pE] = A + Bo? 
Yı = Ey Y1 = Elle, + Bey) (E,-1 + pE] = Bo” 
and 
Ys = Eyy,~5 = Ele, + BE, )(Ey~-5 + PEs-1)] = 0 for alls > 1 


Hence, dividing each y, by yg, it can be immediately seen that the ACF is simply 
Po = 1, p; = B/C + 2°), and p, = 0 for all s > 1. The third graph on the left-hand side 
of Figure 2.2 shows the ACF for the MA(1) process y, = £, — 0.7 €,_,. As an exercise, 
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you should demonstrate that the ACF for an MA(2) process has two spikes and then 
cuts to zero. 


The Autocorrelation Function 
of an ARMA(1,1) Process 


Finally, let y, = a,y,_; + €; + 6, €;_,. Using the now-familiar procedure, we find the 
Yule—Walker equations 


Ey,y, = a,Ey,_,y, + Ee€,y, + BEE 1yY; > % = aN + o + pila; + B,)o? (2.30) 


Ey,Y,-1 = @EY,1Y,-1 + EE Y1 + BEE; 1Y > Y1 = 41o + Bo? (2.31) 
Ey,Y,-2 = 4 Ey,_1Yy-2 + E€,Y;-9 + BEE, 1-2 > N2 = 4171 (2.32) 
EY Yi-s = 4 EY,_1Y,-5 + BEY 5 + BEE: Vis > Ys = UY 5-1 (2.33) 


In obtaining (2.30), note that Ee,_,y, is (a, + Bo. (2.30) and (2.31) simultane- 
ously for yọ and y; yields 


1+8? +2ap, , 


(1 + 4, B,) (a, +p) 3 
Y= ———_,——2 
ad =a) 
Form the ratio y; /Yọ to obtain 
_ Q +ap) @ +P) 


i (2.34) 


(1 + fF + 2a; fy) 
and p, = 4,p,_; forall s > 2. 

Thus, the ACF for an ARMA(1, 1) process is such that the magnitude of p, depends 
on both a, and #,. Beginning with this value of p,, the ACF of an ARMA(1, 1) pro- 
cess looks like that of the AR(1) process. If 0 < a, < 1, convergence will be direct, 
and if —1 < a, < 0, the autocorrelations will oscillate. The ACF for the function y, = 
—0.7y,_; + €, — 0.7 €,_, is shown as the last graph on the left-hand side of Figure 2.2. 
The top portion of Worksheet 2.2 derives these autocorrelations. 

We leave you with the exercise of deriving the correlogram of the ARMA(2, 1) 
process used in Worksheet 2.1. You should be able to recognize the point that the cor- 
relogram can reveal the pattern of the autoregressive coefficients. For an ARMA (p, q) 
model beginning after lag q, the values of the p; will satisfy 


Pi = 2 Pj) + @Pi-2 + °° + Ap Pip 


The previous p values can be treated as initial conditions that satisfy the Yule— 
Walker equations. For these lags, the shape of the ACF is determined by the character- 
istic equation. 
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WORKSHEET 2,2 


CALCULATION OF THE PARTIAL AUTOCORRELATIONS OF 
y, = —0.7y 1 + €, — 0.7€;_) 


Step 1: Calculate the autocorrelations. Use (2.34) to calculate p, as 


1 +0.49\(-0.7 = 0.7 
pa E ) = -0.8445 
1 +0.49 + 20.49) 


The remaining autocorrelations decay at the rate p; = —0.7p,_, so that 
p, = 0.591, p, = —0.414, p, = 0.290, p; = —0.203, p = 0.142, p, = —0.099, p, = 0.070 


Step 2: Calculate the first two partial autocorrelations using (2.35) and (2.36). Hence, 
hıı = p; = —0.8445 
P» = [0.591 — (—0.8445)"]/[1 — (—0.8445)"] = —0.426 


Step 3: Calculate all remaining ¢,, iteratively using (2.37). To find ¢,,, note that p,, = 
bii — bb), = —1.204 and form 


2 2 -l 
33 = (- = £ tar) (: = £ tyn) 
j=l j=1 


= [-0.414 — (—1.204)(0.591) — (—0.426)(—0.8445)]/[1 — (—1.204)(—0.8445) 
~(—0.426)(0.591)] 
= —0.262 


Similarly, to find %44, use 


3 3 a 
Pag = (o = 2 T (: = > tn) 
j=l j=l 


Since $3, = 6; — $33$22_;, it follows that @,, = —1.315 and ¢,, = —0.74. Hence, 
Py, = —0.173 


If we continue in this manner, it is possible to demonstrate that p; = —0.117, b¢, = 
—0.081, 67, = —0.056, and dg, = —0.039. 


6. THE PARTIAL AUTOCORRELATION FUNCTION 


In an AR(1) process, y, and y,_, are correlated even though y,_, does not directly appear 
in the model. The correlation between y, and y, (i.e., p2) is equal to the correlation 
between y, and y,_, (i.e., p) multiplied by the correlation between y,_; and y, (Le., 
pı again) so that p) = (p,). It is important to note that all such indirect correlations 
are present in the ACF of any autoregressive process. In contrast, the partial auto- 
correlation between y, and y,_, eliminates the effects of the intervening values y,_, 
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through y,_,,,. As such, in an AR(1) process, the partial autocorrelation between y, 
and y,_» is equal to zero. The most direct way to find the partial autocorrelation func- 
tion is to first form the series {y"} by subtracting the mean of the series (i.e., 4) from 
each observation to obtain y* = y, — yw. Next, form the first-order autoregression 

Y= PY +e 
where e, is an error term. 

Here, the symbol {e,} is used since this error process may not be white noise. 

Since there are no intervening values, 4; is both the autocorrelation and the par- 
tial autocorrelation between y, and y,_;. Now, form the second-order autoregression 
equation 

Ye = aY a + Pro + 

Here, #5 is the partial autocorrelation coefficient between y, and y,_,. In other 
words, #, is the correlation between y, and y,_, controlling for (i.e., “netting out”) the 
effect of y,_,;. Repeating this process for all additional lags s yields the partial autocor- 
relation function (PACF). In practice, with sample size T, only 7/4 lags are used in 
obtaining the sample PACF. 

Since most statistical computer packages perform these transformations, there 
is little need to elaborate on the computational procedure. However, it should be 
pointed out that a simple computational method relying on the so-called Yule— 
Walker equations is available. One can form the partial autocorrelations from the 
autocorrelations as 


Pir = Pi (2.35) 
boo = (po — P/A — p}) (2.36) 


and, for additional lags, 


sol 
Ps — > Ps-1jPs-j 
j=1 
Pss z — s = 3,4,5,.... (2.37) 


s-l 
1- È b-b; 
j=l 


where Psj = Ps-1j = PssPs-1s—j> J = 1, 2, 3, ree ST 1. 

For an AR(p) process, there is no direct correlation between y, and y,_, for s > p. 
Hence, for s > p, all values of h, will be zero, and the PACF for a pure AR(p) process 
should cut to zero for all lags greater than p. This is a useful feature of the PACF that 
can aid in the identification of an AR(p) model. In contrast, consider the PACF for 
the MA(1) process: y, = €, + Be,_,. As long as p # —1, we can write y,/(1 + BL) = €, 
which we know has the infinite-order autoregressive representation 


Y, — BY, i+ By, a — BY, gt SE, 


As such, the PACF will not jump to zero since y, will be correlated with all of its 
own lags. Instead, the PACF coefficients exhibit a geometrically decaying pattern. If 
p < 0, decay is direct, and if p > 0, the PACF coefficients oscillate. 
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Worksheet 2.2 illustrates the procedure used in constructing the PACF for the 
ARMA(1, 1) model shown in the fifth panel on the right-hand side of Figure 2.2: 


y, = —0.7y 1 + £; — 0.7 E1 


First, calculate the autocorrelations. Clearly, pọ = 1; use equation (2.34) to calcu- 
late as p} = —0.8445. Thereafter, the ACF coefficients decay at the rate p; = (—0.7)p;-1 
for i > 2. Using (2.35) and (2.36), we obtain @,;, = —0.8445 and ¢,, = —0.4250. All 
subsequent #,, and @,; can be calculated from (2.37) as in Worksheet 2.2. 

More generally, the PACF of a stationary ARMA(p, q) process must ultimately 
decay toward zero beginning at lag p. The decay pattern depends on the coefficients 
of the polynomial (1 + fL + Bol? +--+ + PL”). Table 2.1 summarizes some of the 
properties of the ACF and PACF for various ARMA processes. Also, the right-hand 
side graphs of Figure 2.2 show the partial autocorrelation functions of the five indicated 
processes. 

For stationary processes, the key points to note are the following: 


1. The ACF of an ARMA(p, q) process will begin to decay after lag q. After 
lag q, the coefficients of the ACF (i.e., the p;) will satisfy the difference 
equation (p; = 41Pi-1 + 42Pi-2 +++ + + 4,p;_,). Since the characteristic 
roots are inside the unit circle, the autocorrelations will decay after lag q. 
Moreover, the pattern of the autocorrelation coefficients will mimic that 
suggested by the characteristic roots. 

2. The PACF of an ARMA (p, q) process will begin to decay after lag p. After 
lag p, the coefficients of the PACF (i.e., the @,,) will mimic the ACF coeffi- 
cients from the model y,/(1 + B)L + pL? +--+ + B,L4). 


Table 2.1 Properties of the ACF and PACF 


Process ACF PACF 
White noise All p, = 9 (s # 0) All ¢,, =9 
AR(1): a, > 0 Direct geometric decay: p; = af on = 01:%s5 = 9 fors>2 
AR(1): a, < 0 Oscillating decay: p, = a} on = 01;Ps5 = 9 fors>2 
AR(p) Decays toward zero. Coefficients Spikes through lag p. All 
may oscillate. ss =O for s> p 
MA(1): f} >0 Positive spike at lag 1. p, = 0 for Oscillating decay: ¢,, > 0 
s>2 
MA\(1): B <0 Negative spike at lag 1. p, = 0 for Geometric decay: ¢,, < 0 
s>2 
ARMA\1, 1): Geometric decay beginning after Oscillating decay after lag 1. 
a,>0 lag 1. Sign p, = sign(a, + p) bn =P) 
ARMA\(1, 1): Oscillating decay beginning after Geometric decay beginning 
a, <0 lag 1. Sign p, = sign(a, + 2) after lag 1. #4, =p, and 
sign(¢,,) = sign(?,,) 
ARMA(p, q) Decay (either direct or oscillatory) Decay (either direct or 
beginning after lag q oscillatory) beginning after 


lag p 
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We can illustrate the usefulness of the ACF and PACF functions using the model 
y, = dy + 0.7y,_, + €,. If we compare the top two graphs in Figure 2.2, the ACF shows 
the monotonic decay of the autocorrelations while the PACF exhibits the single spike at 
lag 1. Suppose that a researcher collected sample data and plotted the ACF and PACF. If 
the actual patterns compared favorably to the theoretical patterns, the researcher might 
try to estimate data using an AR(1) model. Correspondingly, if the ACF exhibited a 
single spike and the PACF exhibited monotonic decay (see the third graph for the model 
y, = E£; — 0.7 €,_,), the researcher might try an MA(1) model. 


7. SAMPLE AUTOCORRELATIONS 
OF STATIONARY SERIES 


In practice, the theoretical mean, variance, and autocorrelations of a series are unknown 
to the researcher. Given that a series is stationary, we can use the sample mean, variance, 
and autocorrelations to estimate the parameters of the actual data-generating process. 
Let there be T observations labeled y, through yp. We can let y, 6”, and r, be estimates 
of u, o7, and p,, respectively where! 


T 
y=(1/T) Diy, (2.38) 
t=1 
T 
6? =(1/T))) 0, - 9? (2.39) 
t=1 


and for each value of s = 1,2,..., 


T 
È o-o- 
t=s+1 


r= =— (2.40) 
2o- 
t=1 


The sample autocorrelation function [i.e., the ACF derived from (2.40)] and the 
sample PACF can be compared to various theoretical functions to help identify the 
actual nature of the data-generating process. Box and Jenkins (1976) discuss the dis- 
tribution of the sample values of r, under the null that y, is stationary with normally 
distributed errors. Allowing var(r,) to denote the sampling variance of r,, they obtained 


var(r,) = T~ fors = 1 


s—1 
= 77! (: +2)) A) fors > 1 (2.41) 
j=l 


if the true value of r, = 0 [i.e., if the true data-generating process is an MA(s — 1) 
process]. Moreover, in large samples (i.e., for large values of T), r, will be normally dis- 
tributed with a mean equal to zero. For the PACF coefficients, under the null hypothesis 
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of an AR(p) model (i.e., under the null that all ¢,,; p+i 
Ê, +ip+i 18 approximately 1 /T (See Section 2.3 of the Supplementary Manual). 

In practice, we can use these sample values to form the sample autocorrelations 
and partial autocorrelation functions and test for significance using (2.41). For example, 
if we use a 95% confidence interval (i.e., two standard deviations), and the calculated 
value of r; exceeds 2T7!/?, it is possible to reject the null hypothesis that the first-order 
autocorrelation is not statistically different from zero. Rejecting this hypothesis means 
rejecting an MA(s — 1) = MA(O) process and accepting the alternative q > 0. Next, try 
s = 2; var(rz) is(1 + 2r)/T. If r; is 0.5 and T is 100, the variance of r, is 0.015 and the 
standard deviation is about 0.123. Thus, if the calculated value of r, exceeds 2(0.123), 
it is possible to reject the hypothesis r, = 0. Here, rejecting the null means accepting 
the alternative that g > 1. Repeating for the various values of s is helpful in identifying 
the order to the process. The maximum number of sample autocorrelations and partial 
autocorrelations to use is typically set equal to 7/4. 

Within any large group of autocorrelations, some will exceed two standard devi- 
ations as a result of pure chance even though the true values in the data-generating 
process are zero. The Q-statistic can be used to test whether a group of autocorrelations 
is significantly different from zero. Box and Pierce (1970) used the sample autocorre- 
lations to form the statistic 

Q= Ty, i 
k=1 


Under the null hypothesis that all values of r, = 0, Q is asymptotically y? dis- 
tributed with s degrees of freedom. The intuition behind the use of the statistic is that 
high sample autocorrelations lead to large values of Q. Certainly, a white-noise pro- 
cess (in which all autocorrelations should be zero) would have a Q value of zero. If 
the calculated value of Q exceeds the appropriate value in a y? table, we can reject the 
null of no significant autocorrelations. Note that rejecting the null means accepting an 
alternative that at least one autocorrelation is not zero. 

A problem with the Box—Pierce Q-statistic is that it works poorly even in moder- 
ately large samples. Ljung and Box (1978) reported superior small sample performance 
for the modified Q-statistic calculated as 


are zero), the variance of the 


Ss 


OQ=T(T +2) by r2{(T —k) (2.42) 


k=1 


If the sample value of Q calculated from (2.42) exceeds the critical value of y? 
with s degrees of freedom, then at least one value of r, is statistically different from 
zero at the specified significance level. The Box—Pierce and Ljung—Box Q-statistics 
also serve as a check to see if the residuals from an estimated ARMA(p, g) model 
behave as a white-noise process. However, when the s correlations from an estimated 
ARMA(p, q) model are formed, the degrees of freedom are reduced by the number of 
estimated coefficients. Hence, using the residuals of an ARMA(p, g) model, Q has a 
x’ with s — p — q degrees of freedom (if a constant is included, the degrees of freedom 
ares —p—q-1). 
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Model Selection Criteria 


One natural question to ask of any estimated model is: How well does it fit the data? 
Adding additional lags for p and/or q will necessarily reduce the sum of squares of 
the estimated residuals. However, adding such lags entails the estimation of additional 
coefficients and an associated loss of degrees of freedom. Moreover, the inclusion of 
extraneous coefficients will reduce the forecasting performance of the fitted model. As 
discussed in Appendix 2.2 in the Supplementary Manual, there exist various model 
selection criteria that trade-off a reduction in the sum of squares of the residuals for a 
more parsimonious model. The two most commonly used model selection criteria are 
the Akaike Information Criterion (AIC) and the Schwartz Bayesian Criterion (SBC). 
In the text, we use the following formulas 


AIC =T In(sum of squared residuals) + 2n 
SBC =T In(sum of squared residuals) + n In(T) 


where n = number of parameters estimated (p + q + possible constant term) 


T = number of usable observations. 


When you estimate a model using lagged variables, some observations are lost. To 
adequately compare the alternative models, T should be kept fixed. Otherwise, you will 
be comparing the performance of the models over different sample periods. Moreover, 
decreasing T has direct effect of reducing the AIC and the SBC; the goal is not to select 
a model because it has the smallest number of usable observations. For example, with 
100 data points, estimate an AR(1) and an AR(2) using only the last 98 observations 
in each estimation. Compare the two models using T = 98. 

Ideally, the AIC and SBC will be as small as possible (note that both can be neg- 
ative). As the fit of the model improves, the AIC and SBC will approach —co. We can 
use these criteria to aid in selecting the most appropriate model; model A is said to fit 
better than model B if the AIC (or SBC) for A is smaller than for model B. In using the 
criteria to compare alternative models, we must estimate them over the same sample 
period so that they will be comparable. For each, increasing the number of regressors 
increases n but should have the effect of reducing the sum of squared residuals (SSR). 
Thus, if a regressor has no explanatory power, adding it to the model will cause both the 
AIC and SBC to increase. Since In(T) will be greater than 2, the SBC will always select 
a more parsimonious model than will the AIC; the marginal cost of adding regressors 
is greater with the SBC than with the AIC. 

An especially useful feature of the model selection criteria is for comparing 
non-nested models. For example, suppose you want to compare an AR(2) model to 
an MA(3) model. Neither is a restricted form of the other. You would not want to 
estimate an ARMA(2, 3) model and perform F-tests to determine whether a, = a, = 0 
or whether f; = p} = p, = 0. As discussed in Appendix 2.1, the estimation of ARMA 
models necessitates computer-based solution methods. If the AR(2) and MA(3) 
models are each reasonable, the nonlinear search algorithms required to estimate an 
ARMA(2, 3) model are not likely to converge to a solution. Moreover, the values of y,_, 
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and y,_, are clearly correlated with the values of €,_,, €;_7, and €,_3. It is quite possible 
that both the hypotheses could be accepted (or rejected). However, it is straightforward 
to compare the estimated AR(2) and MA(3) models using the AIC or the SBC. 

Of the two criteria, the SBC has superior large sample properties. Let the true order 
of the data-generating process be (p*, q*) and suppose that we use the AIC and SBC to 
estimate all ARMA models of order (p, q) where p > p* and q > q*. Both the AIC and 
the SBC will select models of orders greater than or equal to (p*, g*) as the sample size 
approaches infinity. However, the SBC is asymptotically consistent while the AIC is 
biased toward selecting an overparameterized model. However, in small samples, the 
AIC can work better than the SBC. You can be quite confident in your results if both 
the AIC and the SBC select the same model. If they select different models, you need 
to proceed cautiously. Since SBC selects the more parsimonious model, you should 
check to determine if the residuals appear to be white noise. Since the AIC can select 
an overparameterized model, the f-statistics of all coefficients should be significant at 
conventional levels. A number of other diagnostic checks that can be used to compare 
alternative models are presented in Sections 8 and 9. Nevertheless, it is wise to retain a 
healthy skepticism of your estimated models. With many data sets, it is just not possible 
to find the one model that clearly dominates all others. There is nothing wrong with 
reporting the results and the forecasts using alternative estimations. 

Before proceeding, be aware that a number of different ways are used to report the 
AIC and the SBC. For example, the software packages EViews and SAS report values 
for the AIC and SBC using 


AIC* = —2In(L)/T + 2n/T 
SBC* = —2In(L)/T +n In(T)/T 


where n and T are as defined above and L is the maximized value of the log of the 
likelihood function. 

For a normal distribution, —2 In(L) = T In(2z) + T In(o”) + (1/07) (SSR) . The 
reason for the plethora of reporting methods is that many software packages (such as 
OX, RATS, and GAUSS) do not display any model selection criteria so that users must 
calculate these values by themselves. Programmers quickly find that coding all of the 
parameters contained in the formulas is unnecessary and simply report the shortened 
versions. In point of fact, it does not matter which method you use. If you work through 
Question 7 at the end of this chapter, it should be clear that the model with the smallest 
value for AIC will always have the smallest AIC*. Specifically, Question 7 asks you to 
write down the formula for In(L) and show that the equation for AIC* is a monotonic 
transformation of that for AIC. Hence, whether you use the formula for AIC or AIC’, 
you will always be selecting the same model as the one selected in the text. The identical 
relationship holds between SBC* and SBC; the model yielding the smallest value for 
SBC will always have the smallest value for SBC*. 


Estimation of an AR(1) Model 


Let us use a specific example to see how the sample autocorrelation function and par- 
tial autocorrelation function can be used as an aid in identifying an ARMA model. 
A computer program was used to draw 100 normally distributed random numbers 
with a theoretical variance equal to unity. Call these random variates €,, where f runs 
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Panel (a): ACF for the Panel (b): PACF for the 
AR(1) process AR(1) process 
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FIGURE 2.3 ACF and PACF for Two Simulated Processes 


from 1 to 100. Beginning with ¢ = 1, values of y, were generated using the formula 
y, = 0.7y,_; + £, and the initial condition yọ = 0. Note that the problem of nonsta- 
tionarity is avoided since the initial condition is consistent with long-run equilibrium. 
Panel (a) of Figure 2.3 shows the sample correlogram and Panel (b) shows the sam- 
ple PACF. You should take a minute to compare the ACF and PACF to those of the 
theoretical processes shown in Figure 2.2. 

In practice, we never know the true data-generating process. As an exercise, sup- 
pose we were presented with these 100 sample values and were asked to uncover the 
true process using the Box—Jenkins methodology. The first step might be to compare 
the sample ACF and PACF to those of the various theoretical models. The decaying pat- 
tern of the ACF and the single large spike at lag 1 in the sample PACF suggests an AR(1) 
model. The first three autocorrelations are r} = 0.74, ra = 0.58, and r} = 0.47. [Note 
that these are somewhat greater than the theoretical values of 0.7, 0.49 (0.77 = 0.49), 
and 0.343, respectively.] In the PACF, there is a sizable spike of 0.74 at lag 1, and all 
other partial autocorrelations (except for lag 12) are very small. 

Under the null hypothesis of an MA(0) process, the standard deviation of r; is 
T-'/2 = 0.1. Since the sample value of r} = 0.74 is more than seven standard deviations 
from zero, we can reject the null hypothesis that r; equals 0. The standard deviation of 
ry is obtained by applying (2.41) to the sampling data, where s = 2 


var(ry) = (1 + 2(0.74)”)/100 = 0.021 
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Since (0.021)!/2 = 0.1449, the sample value of r, is more than three standard devi- 
ations from zero; at conventional significance levels, we can reject the null hypothesis 
that r, equals zero. We can similarly test the significance of the other values of the 
autocorrelations. 

As you can see in Panel (b) of the figure, other than ġ;;, all partial autocorrelations 
(except for lag 12) are less than 2T-!/? = 0.2. The decay of the ACF and the single 
spike of the PACF give the strong impression of a first-order autoregressive model. 
Nevertheless, if we did not know the true underlying process and happened to be using 
monthly data, we might be concerned with the significant partial autocorrelation at 
lag 12. After all, with monthly data, we might expect some direct relationship between 
Yy; and y,_ 19. 

Although we know that the data was actually generated from an AR(1) process, it 
is illuminating to compare the estimates of two different models. Suppose we estimate 
an AR(1) model and try to capture the spike at lag 12 with an MA coefficient. Thus, 
we can consider the two tentative models 


Model 1: y, = ayy,_1 + E; 
Model 2: y, = ayyj_1 + E; + Bi2€)-12- 


Table 2.2 reports the results of the two estimations. The coefficient of model 1 
satisfies the stability condition |a,| < 1 and has a low standard error (the associated 
t-statistic for a null of zero is more than 12). As a useful diagnostic check, we plot 
the correlogram of the residuals of the fitted model in Figure 2.4. The Q-statistics for 
these residuals indicate that each one of the autocorrelations is less than two standard 
deviations from zero. The Ljung—Box Q-statistics of these residuals indicate that, as a 
group, lags | through 8, 1 through 16, and 1 through 24 are not significantly different 
from zero. This is strong evidence that the AR(1) model “fits” the data well. After all, if 
residual autocorrelations were significant, the AR(1) model would not use all available 
information concerning movements in the {y,} sequence. For example, suppose we 
wanted to forecast y,,, conditioned on all available information up to and including 
period ¢. With model 1, the value of y,,; iS y,4; = 41y; + €,,,- Hence, the forecast from 


Table 2.2 Estimates of an AR(1) Model 


Model 1 Model 2 
Yt = 4Vp_4 t Et Yt = Vz + Et + Bye t_12 
Degrees of freedom 98 97 
Sum of squared residuals 85.10 85.07 
Estimated a, 0.7904 0.7938 
(standard error) (0.0624) (0.0643) 
Estimated p —0.0325 
(standard error) (0.1141) 
AIC/SBC AIC = 441.9; SBC = 444.5 AIC = 443.9; SBC = 449.1 
Ljung-Box Q-statistics for Q(8) = 6.43 (0.490) Q(8) = 6.48 (0.485) 
the residuals (significance Q(16) = 15.86 (0.391) Q(16) = 15.75 (0.400) 


level in parentheses) Q(24) = 21.74 (0.536) Q(24) = 21.56 (0.547) 
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FIGURE 2.4 ACF of Residuals from Model 1 


model | is a, y,. If the residual autocorrelations had been significant, this forecast would 
not capture all of the available information set. 

Examining the results for model 2, note that both models yield similar estimates 
for the first-order autoregressive coefficient and the associated standard error. However, 
the estimate for Jį) is of poor quality; the insignificant t-value suggests that it should 
be dropped from the model. Moreover, comparing the AIC and the SBC values of the 
two models suggests that the benefits of reducing the SSR is overwhelmed by the detri- 
mental effects of estimating an additional parameter. All of these indicators point to the 
choice of model 1. 

Exercise 8 at the end of this chapter entails various estimations using this series. 
The series is denoted by Y1 in the file SIM2.XLS. In this exercise, you are asked to 
show that the AR(1) model performs better than some alternative specifications. It is 
important that you complete this exercise. 


Estimation of an ARMA(1, 1) Model 


A second {y,} sequence in the file SIM2.XLS was constructed to illustrate the estima- 
tion of an ARMA(1, 1). Given 100 normally distributed values of {¢€,}, 100 values of 
{y,} were generated using 


y, = —0.7y,_) +E, — 0.7 €)_) 


where yg and £g were both set equal to zero. 

Both the sample ACF and the PACF from the simulated data (see the second set of 
graphs in Figure 2.3) are roughly equivalent to those of the theoretical model shown in 
Figure 2.2. However, if the true data-generating process were unknown, the researcher 
might be concerned about certain discrepancies. An AR(2) model could yield a sample 
ACF and PACF similar to those in the figure. Table 2.3 reports the results of estimating 
the data using the following three models: 


Model 1: y, =a1y;1 + €; 
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Table 2.3 Estimates of an ARMA(1,1) Model 


Estimates’ Q-Statistics2 AIC/SBC? 
Model 1 a,: —0.835 (0.053) Q(8) = 26.19 (0.000); AIC = 496.5; 
Q(24) = 41.10 (0.001) SBC = 499.0 
Model 2 a: —0.679 (0.076) Q(8) = 3.86 (0.695); AIC = 471.0; 
bı: —0.676 (0.081) Q(24) = 14.23 (0.892) SBC = 476.2 
Model 3 a: —1.16 (0.093) Q(8) = 11.44 (0.057); AIC = 482.8; 
a,: —0.378 (0.092) Q(24) = 22.59 (0.424) SBC = 487.9 


Notes: 

1Standard errors in parentheses. 

?Ljung—Box Q-statistics of the residuals from the fitted model. The significance levels are in parentheses. 
3For comparability, the AIC and SBC values are reported for estimations that used only observations 
3 through 100. If the AR(1) is estimated using 99 observations, the AIC and SBC are 502.3 and 504.9, 
respectively. If the ARMA(1, 1) is estimated using 99 observations, the AIC and SBC are 476.6 and 481.1, 
respectively. 


Model 2: y, = a y,_) +E, + PiE 
Model 3: y, = a, y,_1 + Gy)-2 + E; 


In examining Table 2.3, notice that all of the estimated values of a, are highly 
significant; each of the estimated values is at least eight standard deviations from zero. 
It is clear that the AR(1) model is inappropriate. The Q-statistics for model 1 indicate 
that there is significant autocorrelation in the residuals. The estimated ARMA(1, 1) 
model does not suffer from this problem. Moreover, both the AIC and the SBC select 
model 2 over model 1. 

The same type of reasoning indicates that model 2 is preferred to model 3. Note 
that, for each model, the estimated coefficients are highly significant and the point 
estimates imply convergence. Although the Q-statistic at 24 lags indicates that these 
two models do not suffer from correlated residuals, the Q-statistic at 8 lags indicates 
serial correlation in the residuals of model 3. Thus, the AR(2) model does not capture 
short-term dynamics, as well as the ARMA(1, 1) model. Also note that the AIC and 
SBC both select model 2. 


Estimation of an AR(2) Model 
A third data series was simulated as 
y, = 0.7y,_; — 0.49y,_5 + £; 
The estimated ACF and PACF of the series are 


Lags Autocorrelations 
1-10 0.47 -0.16 -0.32 -0.11 -0.05 -0.16 -0.10 0.13 0.18 0.03 
11-20 -0.09 -0.11 -0.16 —0.06 0.12 0.25 0.05 -0.17 -0.15 0.01 
Partial Autocorrelations 
1-10 0.47 -0.48 0.02 0.05 -0.25 —0.12 0.10 0.04 —0.08 0.02 
11-20 -0.02 -0.14 —-0.17 0.21 0.01 0.09 —0.22 0.01 —0.02 —0.03 
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Note the large autocorrelation at lag 16 and the large partial autocorrelations at 
lags 14 and 17. Given the way the process was simulated, the presence of these auto- 
correlations is due to nothing more than chance. However, an econometrician unaware 
of the actual data-generating process might be concerned about these autocorrelations. 
The estimated AR(2) model (with f-statistics in parentheses) is 


y, = 0.692y,_, — 0.481y,_5 AIC = 219.87, SBC = 225.04 
(7.73) (-5.37) 


Overall, the model appears to be adequate. However, the two AR(2) coefficients 
are unable to capture the correlations at very long lags. For example, the partial auto- 
correlations of the residuals for lags 14 and 17 are both greater than 0.2 in absolute 
value. The calculated Ljung—Box statistic for 16 lags is 24.6248 (which is significant 
at the 0.038 level). At this point, it might be tempting to try to model the correlation at 
lag 16 by including the moving average term f)¢€,_1¢- Such an estimation results in? 


y, = 0.717y,_; — 0.465y,5 + 0.306£,_ı AIC = 213.40, SBC = 221.16 
(7.87) (=5.11) (2.78) 


All estimated coefficients are significant and the Ljung—Box Q-statistics for the 
residuals are all insignificant at conventional levels. In conjunction with the fact that 
the AIC and SBC both select this second model, the researcher unaware of the true pro- 
cess might be tempted to conclude that the data-generating process includes a moving 
average term at lag 16. 

A useful model check is to split the sample into two parts. If a coefficient is 
present in the data-generating process, its influence should be seen in both subsamples. 
If the simulated series is split into two parts, the ACF and PACF using observations 
50 through 100 follow: 


Lags Autocorrelations 
1—10 046 -0.21 -0.28 0.03 0.10 —0.15 -013 O10 0.18 0.03 
11-20 —0.01 0.01 —0.06 -0.09 0.04 0.21 0.06 —0.16 —0.18 —0.05 
Partial Autocorrelations 
1—10 046 -0.53 0.19 0.06 -0.20 —0.13 0.23 -0.08 0.00 0.06 
11—20 0.15 -0.26 0.03 O15 0.04 0.00 -0.05 -0.01 -0.14 —0.08 


As you can see, the size of the partial autocorrelations at lags 14 and 17 is dimin- 
ished. Now, estimating a pure AR(2) model over this second part of the sample yields 


y, = 0.714y,_, — 0.538y,_5 
(5.92) (—4.47) 
Q(8) = 7.83; Q(16) = 15.93; Q(24) = 26.06 
All estimated coefficients are significant, and the Ljung—Box Q-statistics do 
not indicate any significant autocorrelations in the residuals. The significance levels 
of Q(8), Q(16), and Q(24) are 0.251, 0.317, and 0.249, respectively. In fact, this 


model does capture the actual data-generating process quite well. In this example, the 
large spurious autocorrelations of the long lags can be eliminated by changing the 
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sample period. Thus, it is hard to maintain that the correlation at lag 16 is meaningful. 
Most sophisticated practitioners warn against trying to fit any model to the very long 
lags. As you can infer from (2.41), the variance of r, can be sizable when s is large. 
Moreover, in small samples, a few “unusual” observations can create the appearance 
of significant autocorrelations at long lags. Since econometric estimation involves 
unknown data-generating processes, the more general point is that we always need to 
be wary of our estimated model. Fortunately, Box and Jenkins (1976) established a set 
of procedures that can be used to check a model’s adequacy. 


8. BOX-JENKINS MODEL SELECTION 


The estimates of the AR(1), ARMA(1, 1), and AR(2) models in the previous section 
illustrate the Box—Jenkins (1976) strategy for appropriate model selection. Box and 
Jenkins popularized a three-stage method aimed at selecting an appropriate model for 
the purpose of estimating and forecasting a univariate time series. In the identification 
stage, the researcher visually examines the time plot of the series, the autocorrelation 
function, and the partial correlation function. Plotting the time path of the {y,} sequence 
provides useful information concerning outliers, missing values, and structural breaks 
in the data. Nonstationary variables may have a pronounced trend or appear to meander 
without a constant long-run mean or variance. Missing values and outliers can be cor- 
rected at this point. At one time, the standard practice was to first difference any series 
deemed to be nonstationary. Currently, there is a large body of literature regarding for- 
mal procedures to check for nonstationarity. We defer this discussion until Chapter 4 
and assume that we are working with stationary data. A comparison of the sample ACF 
and PACF to those of various theoretical ARMA processes may suggest several plausi- 
ble models. In the estimation stage, each of the tentative models is fit, and the various 
a; and p; coefficients are examined. In this second stage, the goal is to select a station- 
ary and parsimonious model that has a good fit. The third stage involves diagnostic 
checking to ensure that the residuals from the estimated model mimic a white-noise 
process. 


Parsimony 


A fundamental idea in the Box—Jenkins approach is the principle of parsimony. Parsi- 
mony (meaning sparseness or stinginess) should come as second nature to economists. 
Incorporating additional coefficients will necessarily increase fit (e.g., the value of R? 
will increase) at a cost of reducing degrees of freedom. Box and Jenkins argue that 
parsimonious models produce better forecasts than overparameterized models. A par- 
simonious model fits the data well without incorporating any needless coefficients. Cer- 
tainly, forecasters do not want to project poorly estimated coefficients into the future. 
The aim is to approximate the true data-generating process but not to pin down the 
exact process. The goal of parsimony suggested eliminating the MA(12) coefficient in 
the simulated AR(1) model above. 

In selecting an appropriate model, the econometrician needs to be aware that sev- 
eral different models may have similar properties. As an extreme example, note that 
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the AR(1) model y, = 0.5y,_; + £, has the equivalent infinite-order moving average 
representation of y, = €, + 0.5€,_, + 0.25€,_, + 0.125¢€,_3 + 0.0625¢,_4 +--+ -. In most 
samples, approximating this MA(co) process with an MA(2) or MA(3) model will give 
a very good fit. However, the AR(1) model is the more parsimonious model and is 
preferred. As a quiz, you should show that this AR(1) model has the equivalent repre- 
sentation of y, = 0.25y,_, + 0.5€,_1 + €; 

In addition, be aware of the common factor problem. Suppose we wanted to fit 
the ARMA(2, 3) model 


A -aL-a Ly, = (1+ pL + pE + pL 6 (2.43) 


Suppose that (1 — aL — aL’) and (1 + B,L + pL? + p3L°) can be factored as 
(1 + cL)(1 + aL) and (1 + cL)(1 + bL + bL’), respectively. Since (1 + cL) is a com- 
mon factor to each, (2.43) has the equivalent but more parsimonious form: 


(1+aL)y, = (1+b,L + boL’)e, (2.44) 


If you passed the last quiz, you know that (1 — 0.25L”)y, = (1 + 0.5L)e, is equiv- 
alent to (1 + 0.5L)(1 — 0.5L)y, = (1 + 0.5L)e, so that y, = 0.5y,_, + €,. In practice, the 
polynomials will not factor exactly. However, if the factors are similar, you should try 
a more parsimonious form. 

In order to ensure that the model is parsimonious, the various a; and p; should 
all have f-statistics of 2.0 or greater (so that each coefficient is significantly different 
from zero at the 5% level). Moreover, the coefficients should not be strongly correlated 
with each other. Highly collinear coefficients are unstable; usually, one or more can be 
eliminated from the model without reducing forecast performance. 


Stationarity and Invertibility 


The distribution theory underlying the use of the sample ACF and PACF as approxi- 
mations to those of the true data-generating process assumes that the {y,} sequence is 
stationary. Moreover, t-statistics and Q-statistics also presume that the data are station- 
ary. The estimated autoregressive coefficients should be consistent with this underlying 
assumption. Hence, we should be suspicious of an AR(1) model if the estimated value 
of a, is close to unity. For an ARMA(2, q) model, the characteristic roots of the esti- 
mated polynomial (1 — a,L — a)L”) should lie outside of the unit circle. 

As discussed in greater detail in Appendix 2.1, the Box—Jenkins approach also 
necessitates that the model be invertible. Formally, {,} is invertible if it can be repre- 
sented by a finite-order or convergent autoregressive process. Invertibility is important 
because the use of the ACF and PACF implicitly assume that the {y,} sequence can 
be represented by an autoregressive model. As a demonstration, consider the simple 
MA(1) model: 


Yi = E, — PE} (2.45) 


so that if |f,| < 1, 
y%/Q -pL = €; 
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or 


Yi + Bry + Bio + B Y3 t+ = & (2.46) 


If |f,| < 1, (2.46) can be estimated using the Box—Jenkins method. However, if 
|P,;| = 1, the {y,} sequence cannot be represented by a finite-order AR process; as 
such, it is not invertible. More generally, for an ARMA model to have a convergent 
AR representation, the roots of the polynomial (1 + pL + pL? +--+ + B,L7) must 
lie outside the unit circle. Note that there is nothing improper about a noninvertible 
model. The {y,} sequence implied by y, = £, — €,_, is stationary in that it has a constant 
time-invariant mean (Ey, = Ey,_, = 0), a constant time-invariant variance [var(y,) = 
var(y,_,) = o7(1 + Br) + 267], and the autocovariances y} = —f,o7 and all other y, = 
0. The problem is that the technique does not allow for the estimation of such models. 
If pi = 1, (2.46) becomes 


ME Yi Yee PVRs Pee Foe ey, 


Clearly, the autocorrelations and partial autocorrelations between y, and y,_, will 
never decay. 


Goodness of Fit 


A good model will fit the data well. Obviously, R? and the average of the residual 
sum of squares are common goodness-of-fit measures in ordinary least squares. The 
problem with these measures is that the fit necessarily improves as more parameters 
are included in the model. Parsimony suggests using the AIC and/or SBC as more 
appropriate measures of the overall fit of the model. Also, be cautious of estimates that 
fail to converge rapidly. Most software packages estimate the parameters of an ARMA 
model using a nonlinear search procedure. If the search fails to converge rapidly, it is 
possible that the estimated parameters are unstable. In such circumstances, adding an 
additional observation or two can greatly alter the estimates. 


Postestimation Evaluation 


The third stage of the Box—Jenkins methodology involves diagnostic checking. The 
standard practice is to plot the residuals to look for outliers and evidence of periods 
in which the model does not fit the data well. One common practice is to create the 
standardized residuals by dividing each residual, €,, by its estimated standard devia- 
tion, ø. If the residuals are normally distributed, the plot of the €,/o series should be 
such that no more than 5% lie outside the band from —2 to +2. If the standardized 
residuals seem to be much larger in some periods than in others, it may be evidence 
of structural change. If all plausible ARMA models show evidence of a poor fit dur- 
ing a reasonably long portion of the sample, it is wise to consider using intervention 
analysis, transfer function analysis, or any other of the multivariate estimation methods 
discussed in later chapters. If the variance of the residuals is increasing, a logarithmic 
transformation may be appropriate. Alternatively, you may wish to actually model any 
tendency of the variance to change using the ARCH techniques discussed in Chapter 3. 

It is particularly important that the residuals from an estimated model be seri- 
ally uncorrelated. Any evidence of serial correlation implies a systematic movement 
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in the {y,} sequence that is not accounted for by the ARMA coefficients included in 
the model. Hence, any of the tentative models yielding nonrandom residuals should be 
eliminated from consideration. To check for correlation in the residuals, construct the 
ACF and the PACF of the residuals of the estimated model. You can then use (2.41) 
and (2.42) to determine whether any or all of the residual autocorrelations or partial 
autocorrelations are statistically significant.> Although there is no significance level 
that is deemed “most appropriate,” be wary of any model yielding (1) several residual 
correlations that are marginally significant and (2) a Q-statistic that is barely signifi- 
cant at the 10% level. In such circumstances, it is usually possible to formulate a better 
performing model. 

Similarly, a model can be estimated over only a portion of the data set. The esti- 
mated model can then be used to forecast the known values of the series. The sum of 
the squared forecast errors is a useful way to compare the adequacy of alternative mod- 
els. Those models with poor out-of-sample forecasts should be eliminated. Some of the 
details in constructing out-of-sample forecasts are discussed in Section 9. 


9. PROPERTIES OF FORECASTS 


Perhaps the most important use of an ARMA model is to forecast future values of the 
{y,} sequence. To simplify the discussion, it is assumed that the actual data-generating 
process and the current and past realizations of the {€,} and {y,} sequences are known to 
the researcher. First, consider the forecasts from the AR(1) model y, = dy + a, y;_1 + €; 
Updating one period, we obtain 


Vit = ao + YY; + Ergi 


If you know the coefficients ag and a,, you can forecast y,,, conditional on the 
information available at period t as 


E, Y1 = 4 + ay; (2.47) 


where E,y,,; is a short-hand way to write the conditional expectation of y,,; given the 
information available at t. Formally, E,y,,; = EOY Y1 Yr-29 «> Ep Er ++) 

In the same way, since y,,. = dg + 41Y;+1 + €;42, the forecast of y,,, conditioned 
on the information available at period t is 


E Y2 = ao + QEY 


and using (2.47) 
E, Y2 = do + a (ao + a1y;) 


Thus, the forecast of y,,,; can be used to forecast y,,». The point is that forecasts 
can be constructed using forward iteration; the forecast of y,,; can be used to forecast 
Yijt1- SINCE Yjpj41 = 4o + 41 Yi4; + Eryl It immediately follows that 


EYiajrt = 40 + YE yi (2.48) 
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From (2.47) and (2.48), it should be clear that it is possible to obtain the entire 
sequence of j-step-ahead forecasts by forward iteration. Consider 


Eya =a ta, +az+---+a"')+aly, 


This equation, called the forecast function, expresses all of the j-step-ahead fore- 
casts as a function of the information set in period t. Unfortunately, the quality of 
the forecasts declines as we forecast further out into the future. Think of (2.48) as a 
first-order difference equation in the {E,y,,;} sequence. Since |a,| < 1, the difference 
equation is stable, and it is straightforward to find the particular solution to the differ- 
ence equation. If we take the limit of E,y,,; as j > oo, we find that E,y,4; > do/(1 — a1). 
This result is really quite general: For any stationary ARMA model, the conditional 
forecast of y,,; converges to the unconditional mean as j > œ. 

Because the forecasts from an ARMA model will not be perfectly accurate, it 
is important to consider the properties of the forecast errors. Forecasting from time 
period t, we can define the j-step-ahead forecast error, called e,(j), as the difference 
between the realized value of y,,; and the forecasted value: 


eQ) = Ying — Ea 


Since the one-step-ahead forecast error is equivalent to e,(1) = Y1 — Em1 = 
E1» €,(1) is precisely the “unforecastable” portion of y,,,, given the information avail- 
able in f. 

To find the two-step-ahead forecast error, we need to form e,(2) = Yio — Eyyj42- 
Since y,45 = dg + 4,41 + En and E,y,,. = dg + a) E,y,,1, it follows that 


(2) = a On — Een) + E2 = En2 + E41 


You should take a few moments to demonstrate that, for the AR(1) model, the 
j-step-ahead forecast error is given by 


a 2 3 j-l 
CG) = Enj + Era j-1 + d] Erj-2 +ajEnj-3 t'e Fay Et+1 (2.49) 


Since the mean of (2.49) is zero, the forecasts are unbiased estimates of each value 
Yj- The proof is trivial. Since F,€,,; = E,€,4;-) =t = E,€,,; = O, the conditional 
expectation of (2.49) is E,e,(j) = 0. Since the expected value of the forecast error is 
zero, the forecasts are unbiased. 

Although unbiased, the forecasts from an ARMA model are necessarily inaccurate. 
To find the variance of the forecast error, continue to assume that the elements of the 
{e,} sequence are independent with a variance equal to o?. Hence, from (2.49), the 
variance of the forecast error is 


varle,(j)] = [1 +a? tat +aS+--- +a,” (2.50) 


Thus, the one-step-ahead forecast error variance is o”, the two-step-ahead forecast 
error variance is o2(1 + a‘), and so forth. The essential point to note is that the vari- 
ance of the forecast error is an increasing function of j. As such, you can have more 
confidence in short-term forecasts than in long-term forecasts. In the limit as j > oo, 
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the forecast error variance converges to o? /A-a 2); hence, the forecast error variance 
converges to the unconditional variance of the {y,} sequence. 

Moreover, assuming that the {e,} sequence is normally distributed, you can 
place confidence intervals around the forecasts. The one-step-ahead forecast of y,,, is 
dy + 4,y,, and the forecast error is o°. As such, the 95% confidence interval for the 
one-step-ahead forecast can be constructed as 


ao + ay; + 1.960 


We can construct a confidence interval for the two-step-ahead forecast error in 
the same way. From (2.48), the two-step-ahead forecast is aọ(1 + a,) + ay, and 
(2.50) indicates that var[e,(2)] is o7(1+ a‘). Thus, the 95% confidence interval for the 
two-step-ahead forecast is 


ag(1 + ay) + ary, + 1.960(1 + a?) 


Higher-Order Models 


To generalize the discussion, it is possible to use the iterative technique to derive 
the forecasts for any ARMA(p, g) model. To keep the algebra simple, consider the 
ARMA(2, 1) model 


Yi = Ay + Yj + Agyj_-2 + E; + PiE (2.51) 
Updating one period yields 
Yer = ao + AY, + AYp_1 + Em1 + BE; 


If we continue to assume that (1) all coefficients are known; (2) all variables sub- 
scripted t, t— 1, t — 2,... are known at period ¢; and (3) E,€,4; = 0 for j > 0, the con- 
ditional expectation of y,,, is 


E Y1 = 4o + ay; + ayi + Pie; (2.52) 


Equation (2.52) is the one-step-ahead forecast of y,,,;. The one-step-ahead fore- 
cast error is the difference between y,,, and E,y,,, so that e,(1) = €,,,. To find the 
two-step-ahead forecast, update (2.51) by two periods: 


Vr42 = Ao + AY + AY, + Ergo + Biers 
The conditional expectation of y,,> is 
E Y2 = lo + AE Yri + a2, (2.53) 


Equation (2.53) expresses the two-step-ahead forecast in terms of the one- 
step-ahead forecast and current value of y,. Combining (2.52) and (2.53) yields 


E, Y2 = 4o + alao + ayy, + any,_-1 + BE] + ay, 
= a(l + a)) + [a] + aly, + ayany,-1 + a) Bie; 
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To find the two-step-ahead forecast error, subtract (2.53) from y,,,5. Thus, 


(2) = Ay (Your — E1) + E142 + Pr Erg (2.54) 


Since y,,, — £;);41 is the one-step-ahead forecast error, we can write the forecast 
error as 


(2) = (ay + BE + Er42 (2.55) 
Finally, all the j-step-ahead forecasts can be obtained from 
E Yj = do + EY 45-1 + EY- J 22 (2.56) 


Equation (2.56) demonstrates that the forecasts will satisfy a second-order differ- 
ence equation. As long as the characteristic roots of (2.56) lie inside the unit circle, 
the forecasts will converge to the unconditional mean: a,/(1 — a, — a). We can use 
(2.56) to find the j-step-ahead forecast errors. Since y}; = dg + G)Yp4j-1 + G2Yr4j-2 + 
E14; + BiE,4;-1, the j-step-ahead forecast error is 


e) = a Orji — Era j—1) + 224 j-2 — Eig j—2) + Ej + Press 
= aye, = 1) + ane,J = 2) + Ej + PiE- 


It should be clear that forecasts from any stationary ARMA (p, q) process will even- 
tually satisfy the pth order difference equation comprising the homogeneous portion of 
the model. As such, the multistep-ahead forecasts will converge to the long-run mean 
of the series. 


Forecast Evaluation 


Now that you have estimated a series and have forecasted its future values, the obvious 
question is, “How good are my forecasts?” Typically, there will be several plausible 
models that you can select to use for your forecasts. Do not be fooled into thinking that 
the one with the best fit is the one that will forecast the best. To make a simple point, 
suppose you wanted to forecast the future values of the ARMA(2, 1) process given 
by (2.51). If you could forecast the value of yp}; using (2.52), you would obtain the 
one-step-ahead forecast error 


er(1) = Yr41 — 4o — 41yr — Yr- — Byer = Ersi (2.57) 


Since the forecast error is the pure unforecastable portion of yr}, no other ARMA 
model can provide you with superior forecasting performance. As such, it appears that 
the “true” model will provide superior forecasts to those from any other possible model. 
In practice, you will not know the actual order of the ARMA process or the actual 
values of the coefficients of that process. Instead, to create out-of-sample forecasts, 
it is necessary to use the estimated coefficients from what you believe to be the most 
appropriate form of an ARMA model. Let a hat or caret (i.e.: ^) over a parameter denote 
the estimated value of a parameter, and let {é,} denote the residuals of the estimated 
model. Hence, if you use the estimated model, the one-step-ahead forecast will be 


ErYr41 = M% + Gp + âyr-ı + Bier (2.58) 
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and the one-step-ahead forecast error will be 
er(1) = Yr4; — Go + Ââ1Yr + Gyyp_) + BiEr) 


Clearly, this forecast will not be identical to that from (2.57). When we forecast 
using (2.58), the coefficients (and the residuals) are estimated imprecisely. The fore- 
casts made using the estimated model extrapolate this coefficient uncertainty into the 
future. Since coefficient uncertainty increases as the model becomes more complex, it 
could be that an estimated AR(1) model forecasts the process given by (2.51) better than 
an estimated ARMA(2, 1) model. The general point is that large models usually contain 
in-sample estimation errors that induce forecast errors. As shown in the studies by Clark 
and West (2007), Dimitrios and Guerard (2004), and Liu and Enders (2003), forecasts 
using overly parsimonious models with little parameter uncertainty can provide better 
forecasts than models consistent with the actual data-generating process. Moreover, it 
is very difficult to construct confidence intervals for this type of forecast error. Not only 
is it necessary to include the effects of the stochastic variation in the future values of 
{y74;}, but also it is necessary to incorporate the fact that the coefficients are estimated 
with error. 

How do you know which one of the several reasonable models has the best fore- 
casting performance? One way to answer this question is to put the alternative models 
to a head-to-head test. Since the future values of the series are unknown, you can hold 
back a portion of the observations from the estimation process. As such, you can esti- 
mate the alternative models over the shortened span of data and use these estimates to 
forecast the observations of the holdback period. You can then compare the properties 
of the forecast errors from the two models. To take a simple example, suppose that {y,} 
contains a total of 150 observations and that you are unsure as to whether an AR(1) or 
an MA(1) model best captures the behavior of the series. 

One way to proceed is to use the first 100 observations to estimate both models 
and use each to forecast the value of y,,,. Since you know the actual value of y;9,, you 
can construct the forecast error obtained from the AR(1) and from the MA(1). These 
two forecast errors are precisely those that someone would have made if they had been 
making a one-step-ahead forecast in period 100. Now, reestimate an AR(1) and an 
MA(1) model using the first 101 observations. Although the estimated coefficients will 
change somewhat, they are those that someone would have obtained in period 101. 
Use the two models to forecast the value of y,,. Given that you know the actual value 
of y,97, you can construct two more forecast errors. Since you know all values of the 
{y,} sequence through period 150, you can continue this process so as to obtain two 
series of one-step-ahead forecast errors, each containing 50 observations. To keep the 
notation simple, let {f,;} and {f,;} denote the sequence of forecasts from the AR(1) 
and the MA(1), respectively. If you understand the notation, it should be clear that f}; = 
E\o0¥ 101 İS the first forecast using the AR(1) and f> 50 is the last forecast from the MA(1). 

Obviously, it is desirable that the forecast errors have a mean near zero and a small 
variance. A regression-based method to assess the forecasts is to use the 50 forecasts 
from the AR(1) to estimate an equation of the form 


Yioo+i = G0 + Afi + Vii i=1,...,50 
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If the forecasts are unbiased, an F-test should allow you to impose the restric- 
tion dy = 0 and a, = 1. Similarly, the residual series {v,;} should act as a white-noise 
process. It is a good idea to plot {v,;} to determine if there are periods in which your 
forecasts are especially poor. Now repeat the process with the forecasts from the MA(1). 
In particular, use the 50 forecasts from the MA(1) to estimate 


Yioo+i = Bo + Bhai + Vai i=1,...,50 


Again, if you use an F-test, you should not be able to reject the joint hypothesis 
bo = O and b, = 1. If the significance levels from the two F-tests are similar, you might 
select the model with the smallest residual variance; that is, select the AR(1) if var(v,) < 
var(v).4 

More generally, you might want to have a holdback period that differs from 
50 observations. If you have a large sample, it is possible to hold back as much as 
50% of the data set. Also, you might want to use the j-step-ahead forecasts instead 
of the one-step-ahead forecasts. For example, if you have quarterly data and want to 
forecast 1 year into the future, you can perform the analysis using the four-step-ahead 
forecasts. Once you have the two sequences of forecast errors, you can compare 
their properties. With a very small sample, it may not be possible to hold back 
many observations. Small samples are a problem since Ashley (2003) showed that 
very large samples are often necessary to reveal a significant difference between the 
out-of-sample forecasting performances of similar models. You need to have enough 
observations to have well-estimated coefficients for the in-sample period and enough 
out-of-sample forecasts so that the test has good power. 

Instead of focusing on the bias, many researchers would select the model 
with the smallest mean square prediction error (MSPE). Suppose you construct H 
one-step-ahead forecasts from two different models. Again, let f}; be the forecasts from 
model 1 and fz; be the forecasts from model 2. Since we are using the one-step-ahead 
forecasts, we can suppress the subscript j and denote the two series of forecasts errors 
as e}; and ey;. As such, the MSPE of model 1 can be calculated as 


H 
1 2 
MSPE = 59, “i; 
i=l 
Several methods have been proposed to determine whether one MSPE is statisti- 
cally different from the other. If you put the larger of the two MSPEs in the numerator, 
a standard recommendation is to use the F-statistic 


H H 
F=4, / v4, (2.59) 


The intuition is that the value of F will equal unity if the forecast errors from 
the two models are identical. A very large value of F implies that the forecast errors 
from the first model are substantially larger than those from the second. Under the null 
hypothesis of equal forecasting performance, (2.59) has a standard F-distribution with 
(H, H) degrees of freedom if the following three assumptions hold: 
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1. The forecast errors have zero mean and are normally distributed. 
2. The forecast errors are serially uncorrelated. 
3. The forecast errors are contemporaneously uncorrelated with each other. 


Although it is common practice to assume that the {€,} sequence is normally dis- 
tributed, it is not necessarily the case that the forecast errors are normally distributed 
with a mean value of zero. Similarly, the forecasts may be serially correlated; this is 
particularly true if you use multistep-ahead forecasts. For example, equation (2.55) 
indicated that the two-step-ahead forecast error for y,,5 is 


€(2) = (ay + BEng + Er42 
and updating by one period yields the two-step-ahead forecast error for y,.3: 


C412) = (a; + BE. + E143 


It should be clear that the two forecast errors are correlated. In particular, 
Ele,(2)e,4(2)] = (a, + Bo? 


The point is that predicting y, from the perspective of period t and predicting y,,; 
from the perspective of period ¢ + 1 both contain an error due to the presence of €,,5. 
However, fori > 1, E[e,(2)e,,;(2)] = 0 since there are no overlapping forecasts. Hence, 
the autocorrelations of the two-step-ahead forecast errors cut to zero after lag 1. You 
should be able to demonstrate the general result that the j-step-ahead forecast errors act 
as an MA(j — 1) process. 

Finally, the forecast errors from the two alternative models will usually be highly 
correlated with each other. For example, a negative realization of €,,, will tend to cause 
the forecasts from both models to be too high. Unfortunately, the violation of any one 
of these assumptions means that the ratio of the MSPEs in (2.59) does not have an 
F-distribution. 


THE GRANGER-NEWBOLD TEST Granger and Newbold (1976) show how to 
overcome the problem of contemporaneously correlated forecast errors. If you have 
H one-step-ahead forecast errors from each model, use the two sequences of forecast 
errors to form 


Xj = eji + C9; and Zi = yi — êz; b= [yen A. 


Given that the first two assumptions above are valid, under the null hypothesis of 
equal forecast accuracy, x; and z; should be uncorrelated. Consider: 


= = 2 2 
Pxz = Ex;zz; = Eley, — e5 


If the models forecast equally well, it follows that Ee, = Ee? Model 1 has a larger 
MSPE if p,, is positive, and model 2 has a larger MSPE if, Px, IS ne ealive: Let r, denote 
the sample correlation coefficient between {x;} and {z;}. Granger and N ewbald (1976) 
show that if assumptions 1 and 2 hold 


/yA=r (=D) (2.60) 
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has a f-distribution with H — 1 degrees of freedom. Thus, if r, is statistically different 
from zero, model 1 has a larger MSPE if r,, is positive, and model 2 has a larger MSPE 
if r,, is negative. 


THE DIEBOLD-MARIANOTEST There isa very large literature trying to extend 
the Granger—Newbold test so as to relax assumptions 1 and 2. Moreover, applied 
econometricians might be interested in measures of forecasting performance other than 
the sum of squared errors. Indeed, it should be clear that using the sum of squared errors 
as a criterion makes sense only if the loss from making an incorrect forecast is quadratic. 
However, there are many other possibilities. For example, if your loss depends on the 
size of the forecast error, you should be concerned with the absolute values of the 
forecast errors. Alternatively, an options trader receives a payoff of zero if the value 
of the underlying asset lies below the strike price but receives a one-dollar payoff for 
each dollar the asset price rises above the strike price. In such a circumstance, the loss 
payoff is asymmetric. Diebold and Mariano (1995) have developed a test that relaxes 
assumptions 1—3 and allows for an objective function that is not quadratic. 

As before, if we consider only one-step-ahead forecasts, we can eliminate the sub- 
script j. As such, we can let the loss from a forecast error in period i be denoted by 
g(e;). In the typical case of mean-squared errors, the loss is e., Nevertheless, to allow 
the loss function to be general, we can write the differential loss in period i from using 
model 1 versus model 2 as d; = g(e;) — 8(e2;). The mean loss can be obtained as 


H 


d= ZY Ile, = gle) (2.61) 


i=1 


Under the null hypothesis of equal forecast accuracy, the value of d is zero. Since 
d is the mean of the individual losses, under fairly weak conditions, the central limit 
theorem implies that d should have a normal distribution. Hence, it is not necessary to 
assume that the individual forecast errors are normally distributed. Thus, if we knew 


var(d), we could construct the ratio d/ V/ var(d) and test the null hypothesis of equal 
forecast accuracy using a standard normal distribution. In practice, the implementation 
of the test is complicated by the fact that we need to estimate var(d). 

If the {d;} series is serially uncorrelated with a sample variance of yg, the estimate 
of var(d) is simply y,/(H — 1). Since we use the estimated value of the variance, the 
expression d/ ¥o/(H — 1) has a t-distribution with H — 1 degrees of freedom. 

There is a very large literature on the best way to estimate the standard deviation 
of d in the presence of serial correlation. Many of the technical details are not appropri- 
ate here. Diebold and Mariano let y; denote the ith autocovariance of the d, sequence. 
Suppose that the first q values of y; are different from zero. The variance of d can 
be approximated by var(d) = [yo + 2y; ++- + 2y 1 = 1)7!; the standard deviation 
is the square root. As such, Harvey, Leybourne, and Newbold (1998) recommended 
constructing the Diebold—Mariano (DM) statistic as 


DM = d/4/ (o + 27, +--+ + 27,)/(H - 1) (2.62) 
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Compare the sample value of (2.62) to a t-statistic with H — 1 degrees of freedom. 
As a practical matter, a simple way to proceed is to regress the d; on a constant and 
use a t-test (with robust standard errors) to determine whether the constant is statisti- 
cally different from zero. Note that the construction of (2.62) can be sensitive to the 
choice of q, and if one or more of the y; < 0, the estimated variance can be negative. 
In such circumstances, it is preferable to use robust standard errors—such as those 
in Newey and West (1987). All professional software packages allow you to directly 
obtain the Newey—West estimator of the variance. Additional details are included in 
the Supplementary Manual. 

It is also possible to use the method for the j-step-ahead forecasts e;(j) and e5;(/). 
Construct each d; = g(ej,(j)) — g(e2;) and the mean d. If you construct H forecast errors, 
the DM statistic is 


DM = d/y/(% + 27, +--+ + 27,)/[H + 1 — 2j + HG — DI. 


An example showing the appropriate use of the Granger—Newbold and 
Diebold—Mariano tests is provided in Section 10. Nevertheless, before proceeding, 
a strong word of caution is in order. Clark and McCracken (2001) show that the 
Granger—Newbold and Diebold—Mariano tests have a f-distribution only when the 
underlying forecasting models are not nested. For example, the tests might not work 
well when comparing forecasts from an AR(1) model to those obtained from an 
ARMA(2, 1) model. Clearly, the AR(1) can be obtained from the ARMA(2, 1) 
specification by setting a, = pı = 0. The problem with nested models is that under the 
null hypothesis of equal MSPEs (so that the data are generated by the small model), 
the two models should predict equally well. However, the large model will always 
contain some extra error as it contains unnecessary parameters. Hence, if you want 
to test whether the data are actually generated from the different models, you need to 
control for the parameter uncertainty. 

Clark and West (2007) develop a simple procedure to adjust the forecast errors 
from the large model so that a simple variant of the DM statistic can be used with 
nested models. To continue with the notation developed above, denote the H forecasts 
from model 1 as f}; and the forecast errors as e,;. Similarly, the H forecasts and forecast 
errors from model 2 are fọ; and e;, respectively. Let model 1 be nested within model 2. 
Given that the models are nested, the sole reason for any discrepancy between f; and 
jo; is due to parameter estimation error. If this estimation error is subtracted from e3,, 
the adjusted forecast errors can be used as the basis for the modified DM test. Consider 
the z; series constructed from the squares of these errors as 


zi = (e) — (ex)? = Gu” i=1,...,H. 


Allowing for parameter uncertainty, under the null hypothesis of that the two mod- 
els predict equally well, z; should be zero. Under the alternative hypothesis, the data are 
generated from model 2. Hence, to perform the test, regress the z; series on a constant. 
Since the test is one sided, if the t-statistic for the constant exceeds 1.645, reject the 
null hypothesis of equal forecast accuracy at the 5% significance level. If you reject the 
null hypothesis, conclude that the data are generated from model 2. Otherwise, the data 
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are more likely to be generated from model 1. If the {z;} series is serially correlated, 
you should perform the test with a robust t-statistic, such as that in the study by Newey 
and West (1987). 


10. A MODEL OF THE INTEREST RATE SPREAD 


The term “textbook example” is supposed to connote a very clear-cut illustration. 
If you are looking for a textbook example of the Box—Jenkins methodology, go 
back to Section 7 or turn to Question 11 at the end of this chapter. In practice, we 
rarely find a data series that precisely conforms to a theoretical ACF or PACF. This 
section is intended to illustrate some of the ambiguities that can be encountered 
when using the Box—Jenkins technique. These ambiguities may lead two equally 
skilled econometricians to estimate and forecast the same series using very different 
ARMA processes. Many view the necessity of relying on the researcher’s judgment 
and experience as a serious weakness of a procedure that is designed to be scientific. 
Yet, if you make reasonable choices, you will select models that come very close to 
mimicking the actual data-generating process. 

It is useful to illustrate the Box—Jenkins modeling procedure by estimating a 
quarterly model of the spread between a long-term and a short-term interest rate. 
Specifically, the interest rate spread (s,) can be formed as the difference between 
the interest rate on 5-year U.S. government bonds and the rate on 3-month treasury 
bills. The data used in this section are the series labeled R5 and TBILL in the file 
QUARTERLY.XLS. Exercise 12 at the end of this chapter will help you to reproduce 
the results reported below. 

Panel (a) of Figure 2.5 shows the spread over the period from 1960Q1 to 201204. 
Although there are a few instances in which the spread is negative, the difference 
between long- and short-term rates is generally positive (the sample mean is 1.21). 
Notice that the series shows a fair amount of persistence in that the durations when 
the spread is above or below the mean can be quite lengthy. Moreover, there do not 
appear to be any major structural breaks (such as a permanent jump in the mean or 
variance) in that the dynamic nature of the process seems to be constant over time. As 
such, it is quite reasonable to suppose that the {s,} sequence is covariance stationary. 
In contrast, as shown in Panel (b), the first difference of the spread seems to be very 
erratic. As you will verify in Exercise 12, the As, series has little informational content 
that can be used to forecast its future values. As such, it seems reasonable to estimate a 
model of the {s,} sequence without any further transformations. Nevertheless, because 
there are several large positive and negative jumps in the value of s,, some researchers 
might want to transform it so as to diminish its volatility. A reasonable number of such 
shocks might indicate a departure from the assumption that the errors are normally dis- 
tributed. Although a logarithmic or a square root transformation is impossible because 
some realizations of s, are negative, one could dampen the series using y, = log(s, + 3). 
The point is that you should always maintain a healthy skepticism of the accuracy of 
your model since the behavior of the data-generating process may not fully conform to 
the underlying assumptions of the methodology. 
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Panel a: The interest rate spread 
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Panel b: First-difference of the spread 
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FIGURE 2.5 Time Path of the Interest Rate Spread 


Before reading on, you should examine the autocorrelations and partial autocor- 
relation functions of the {s,} sequence shown in Figure 2.6. Try to identify the ten- 
tative models that you would want to estimate. Recall that the theoretical ACF of a 
pure MA(q) process cuts off to zero at lag q, and the theoretical ACF of an AR(1) 
model decays geometrically. Examination of Figure 2.6 suggests that neither of these 
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FIGURE 2.6 ACF and PACF of the Spread 
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specifications perfectly describes the sample data. In selecting your set of plausible 
models, also note the following: 


1. The ACF and PACF converge to zero quickly enough that we do not have to 
worry about a time-varying mean. As suggested above, we do not want to 
overdifference the data and try to model the { As,} sequence. 

2. The ACF does not cut to zero so that we can rule out a pure MA(q) process. 


3. The ACF is not really suggestive of a pure AR(1) process in that the decay 
does not appear to be geometric. The value of p, is 0.857, and the values of 
P2, P3, and p4 are 0.678, 0.550, and 0.411, respectively. 


4. The estimated values of the PACF are such that ,,; = 0.858, 6. = —0.217, 
$33 = 0.112, and dy, = —0.188. Although ¢ss is close to zero, deg = —0.151 
and $77 = 0.136. Recall that, under the null hypothesis of a pure AR(p) 

model, the variance of Ø}; p}; 18 approximately equal to 1 /T. Since there 

are 212 total observations, the values of 57, #44, and dg are more than 

two standard deviations from zero (i.e., 2/212°> = 0.138). Ina pure AR(p) 

model, the PACF cuts to zero after lag p. Hence, if the s, series follows a pure 

AR(p) process, the value of p could be as high as six or seven. 


5. There appears to be an oscillating pattern in the PACF in that the first seven 
values alternate in sign. Oscillating decay of the PACF is characteristic of a 
positive MA coefficient. 


Due to the number of small and marginally significant coefficients, the ACF and PACF 
of the spread are probably more ambiguous than most of those you will encounter. 
Hence, suppose you do not know where to start and estimate the s, series using a pure 
AR(p) model. To illustrate the point, if you estimate the s, series as an AR(7) process, 
you should obtain the estimates given in column 2 of Table 2.4. If you examine the 
table, you will find that all of the ¢-statistics on the first six lags exceed 1.96 in absolute 
value (indicating that the coefficients are significant the 5% level). Since t-statistic on 
the coefficient for y,_7 is 1.93, it is unclear as to whether to include the seventh lag. The 
sum or squared residuals (SSR) is 43.86 and the AIC and SBC are 791.10 and 817.68, 
respectively. The significance levels of the Q-statistics for lags 4, 8, and 12 indicate no 
remaining autocorrelation in the residuals. 

Although the AR(7) model has some desirable attributes, one reasonable estima- 
tion strategy is to eliminate the seventh lag and estimate an AR(6) model over the same 
sample period. [Note that the data set begins in 1960Q1, so that with seven lags the esti- 
mation of the AR(7) begins in 1961Q4.] Although the autocorrelations of the residuals 
are such that pg = 0.20, the significance levels of the Q(4), Q(8), and Q(12) statis- 
tics (equal to 0.29, 10.93, and 16.75) are 0.99, 0.21, and 0.16, respectively. As such, 
the Q-statistics suggest that you should not try to account to account for the residual 
autocorrelations at lag 8. Although a; appears to be statistically insignificant, it is gen- 
erally not a good idea to use f-statistics to eliminate intermediate lags. As such, most 
researchers would not eliminate the fifth lag and estimate a model with lags 1 through 
4 and lag 6. Recall that the appropriate use of a t-statistic requires that regressor in 
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Table 2.4 Estimates of the Interest Rate Spread 


AR (7) AR (6) AR (2) ARMA ARMA p=2; 
p=1,2,7 (1,1) (2,1) ma = (1,7) 
a 1.20 1.20 1.19 1.19 1.19 1.19 1.20 
(6.57) (7.55) (6.02) (6.80) (6.16) (5.56) (5.74) 
a, 1.11 1.09 1.05 1.04 0.76 0.43 0.36 
(15.76) (15.54) (15.25) (14.83) (14.69) (2.78) (3.15) 
a, —0.45 —0.43 —0.22 —0.20 0.31 0.38 
(—4.33) (—4.11) (-3.18) (—2.80) (2.19) (3.52) 
a 0.40 0.36 
(3.68) (3.39) 
a, —0.30 —0.25 
(-2.70)  (—2.30) 
a5 0.22 0.16 
(2.02) (1.53) 
a —0.30 —0.15 
(—2.86) (-2.11) 
a, 0.14 —0.03 
(1.93) (-0.77) 
B, 0.38 0.69 0.77 
(5.23) (5.65) (9.62) 
Ba -0.14 
(-3.27) 
SSR 43.86 44.68 48.02 47.87 46.93 45.76 43.72 
AIC 791.10 792.92 799.67 801.06 794.96 791.81 784.46 
SBC 817.68 816.18 809.63 814.35 804.93 805.10 801.07 
Q(4) 0.18 0.29 8.99 8.56 6.63 1.18 0.76 
Q(8) 5.69 10.93 21.74 22.39 18.48 12.27 2.60 
Q(12) 13.67 16.75 29.37 29.16 24.38 19.14 11.13 


Notes: 

To ensure comparability, each equation was estimated over the 196104 — 201204 period. 

Values in parentheses are the t-statistics for the null hypothesis that the estimated coefficient is equal 
to zero. SSR is the sum of squared residuals. Q(n) are the Ljung-Box Q-statistics of the residual 
autocorrelations. 

For ARMA models, many software packages do not actually report the intercept term a). Instead, they 
report the estimated mean of process, Hy along with the t-statistic for the null hypothesis that My = 0. The 
historical reason for this convention is that it was easier to first demean the data and then estimate the 
ARMA coefficients than to estimate all values in one step. If your software package reports a constant 
term approximately equal to 0.216, it is reporting the estimated intercept. 


question be uncorrelated with the other regressors. Given the autoregressive nature of 
the series, y,_; is certainly correlated with y,_, and y,_¢. The overall result is that the 
diagnostic checks of the AR(6) model suggest that it is adequate. In comparing the 
AR(6) and AR(7) models, the AIC selects the AR(7) model, whereas the SBC selects 
the more parsimonious AR(6) model. 

Suppose that you try a very parsimonious model and estimate an AR(2). As you 
can see from the fourth column of the table, the AIC selects the AR(7) model, but SBC 
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selects the AR(2) model. However, the residual autocorrelations from the AR(2) are 
problematic in that 


P| P2 P3 P4 Ps P6 P7 Ps 


0.03 -0.13 0.16 0.01 0.08 -0.10 -0.14 0.16 


The Q-statistics from the AR(2) model indicate significant autocorrelation in the 
residuals at the shorter lags. As such, it should be eliminated from further consideration. 

If you examined the AR(7) carefully, you might have noticed that a3 almost offsets 
a, and that as almost offsets ag (since a3 + a4 ~ 0 and as + dg © 0). If you reestimate 
the model without s,_3, 5,4, 5,5, and s,_¢, you should obtain the results given in col- 
umn 5 of Table 2.4. Since the coefficient for s,_7 is now statistically insignificant, it 
might seem preferable to use the AR(2) instead. Yet, the AR(2) has been shown to be 
inadequate relative to the AR(7) and the AR(6) models. 

Even though the AR(6) and AR(7) models perform relatively well, they are not 
necessarily the best forecasting models. There are several possible alternatives since 
the patterns of the ACF and PACF are not immediately clear. Results for a number of 
models with MA terms are shown in columns 6, 7, and 8 of Table 2.4: 


1. From the decaying ACF, someone might try to estimate the ARMA(1, 1) 
model reported in column 6 of the table. The estimated value of a, (0.76) 
is statistically different from zero and is almost five standard deviations 
from unity. The estimated value of p, (0.38) is statistically different from 
zero and implies that the process is invertible. Notice that the SBC from the 
ARMAC(1, 1) is smaller than that of the AR(7) and the AR(6). Nevertheless, 
the ARMA(1, 1) specification is inadequate because of remaining serial 
correlation in the residuals. The Ljung—Box Q-statistic for four lags of the 
residuals (equal to 6.63) has a significance level of 15.7%. As such, we 
cannot reject the null that Q(4) = 0 any conventional significance level. 
However, the Q(8) and Q(12) statistics indicate that the residuals from this 
model exhibit substantial serial autocorrelation. As such, we must eliminate 
the ARMA(1, 1) model from consideration. 

2. Since the ACF decays and the PACF seems to oscillate beginning with lag 
2 (ho7 = —0.217), it seems plausible to estimate an ARMA(2, 1) model. 
As shown in column 6 of the table, the model is an improvement over the 
ARMA(I, 1) specification. The estimated coefficients (a, = 0.43 and a, = 
0.31) are each significantly different from zero at conventional levels and 
imply characteristic roots in the unit circle. The AIC selects the ARMA(2, 1) 
model over that AR(6) and the SBC selects the ARMA(2, 1) over the AR(6) 
and the AR(7). The values for Q(4), Q(8), and Q(12) indicate that the auto- 
correlations of the residuals are not statistically significant at the 5% level. 
Consider the ACF of the residuals: 


P| P2 P3 P4 P5 P6 P7 Pg 
0.01 0.01 -0.07 —0.02 —0.03 -0.08 —0.15 0.15 
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In order to account for the serial correlation at lag 7, it might seem plausible 
to add an MA term to the model at lag 7. As given in the last column of the 
table, all of the estimated coefficients are of high quality. In particular, the 
coefficient for p} has a t-statistic of —3.27. The estimated values of a, and a, 
are similar to those of the ARMA(2, 1) model. Again, the Q-statistics indicate 
that the autocorrelations of the residuals are not significant at conventional 
level. Both the AIC and SBC select the ARMA[2,(1,7)] specification over 
any of the other models. You can easily verify that the MA coefficient at lag 7 
provides a better fit than an AR coefficient at lag 7 and that an ARMA[2,(1,8)] 
model is inadequate. 


Although the ARMA[2,(1,7)] model appears to be quite reasonable, other 
researchers might have selected a decidedly different model. Consider some of the 
alternatives listed below. 


1. 


Parsimony versus Overfitting: In Section 7, we examined the issue of fit- 
ting an MA coefficient at lag 16 to a true AR(2) process. If you reexamine the 
example, you can understand why some researchers shy away from estimat- 
ing a model with long lags lengths that are disjoint from those of other peri- 
ods. In the example of the spread, the problem with the ARMA(2, 1) model 
is that there was a small amount of residual autocorrelation around lag 7 or 
8. The addition of the MA coefficient at lag 7 yielded a model with a better 
fit and remedied the serial correlation problem. However, is it really plausi- 
ble that €,_7 has a direct effect on the current value of the interest rate spread 
while lags 3, 4, 5, and 6 have no direct effects? In other words, do the markets 
for securities work in such a way that what happens 7 quarters in the past has a 
larger effect on today’s interest rates than events occurring in the more recent 
past? Moreover, as you can verify by estimating the ARMA[2,(1,7)] model, 
the t-statistic for p4 over the 198201-201204 period is equal to 0.60 and is 
not statistically significant. Notice that Panel (b) of Figure 2.5 suggests that 
the volatility of the spread in the late 1970s and early 1980s is not typical of 
the entire sample. It could be the case that the realizations from this period are 
anomalies that have large effects on the coefficient estimates and their stan- 
dard errors. Thus, even though the AIC and SBC select the ARMA[2,(1,7)] 
model over the ARMA(2, 1) model, some researchers would prefer the latter. 
More generally, overfitting refers to a situation in which an equation 
is fit to some of the idiosyncrasies of present in a particular sample that are 
not actually representative of the data-generating process. In applied work, 
no data set will perfectly correspond to every assumption required for the 
Box—Jenkins methodology. Since it is not always clear which characteristics 
of the sample are actually present in the data-generating process, the attempt 
to expand a model so as to capture every feature of the data may lead to 
overfitting. 
Volatility: Given the volatility of the {s,} series during the late 1970s and 
early 1980s, transforming the spread using some sort of a square root or log- 
arithmic transformation might be appropriate. Moreover, the s, series has a 
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number of sharp jumps, indicating that the assumption of normality might be 
violated. For a constant c such that s, + c is always positive, transformations 
such as In(s, + c) or (s, + c)?> yield series with less volatility than the s, series 
itself. Alternatively, it is possible to model the difference between the log of 
the 5-year rate and the log of the 3-month rate. 

A general class of transformations was proposed by Box and Cox (1964). 
Suppose that all values of {y,} are positive so that it is possible to construct 
the transformed {y¥* } sequence as 


y: =O7R-D/A A#0 
= In(y,) A=0 


The common practice is to transform the data using a preselected value 

of A. The selection of a value for å that is close to zero acts to “smooth” the 
sequence. An ARMA model can be fitted to the transformed data. Although 
some software programs have the capacity to simultaneously estimate 4 
along with the other parameters of the ARMA model, this approach has fallen 
out of fashion. Instead, it is possible to actually model the variance using the 
methods discussed in Chapter 3. 

3. Trends: Suppose that the span of the data had been somewhat different in 
that the first observation was for 1973Q1 and the last was for 200404. If you 
examine Panel (a) of Figure 2.4, you can see that someone might be confused 
and believe that the data contained an upward trend. Their misinterpretation 
of the data might be reinforced by the fact that the ACF converges to zero 
rather slowly. As such, they might have estimated a model of the As, series. 
Others might have detrended the data using a deterministic time trend. 


Out-of-Sample Forecasts 


We can assess the forecasting performance of the AR(7) and ARMA[2,(1,7)] models by 
examining their bias and mean square prediction errors. Given that the data set contains 
a total of 205 (i.e., 205 = 212 — 7) usable observations, it is possible to use a holdback 
period of 50 observations. This way, there are at least 155 observations in each of the 
estimated models and an adequate number of out-of-sample forecasts. First, the two 
models were estimated using all available observations through 2000Q2 and the two 
one-step-ahead forecasts were obtained. The actual value of sy 99).3 = 0.40; the AR(7) 
predicted a value of 0.697, and the ARMA[2,(1,7)] model predicted a value of 0.591. 
Thus, the forecast of the ARMA[2,(1,7)] is superior to that of the ARMA(7) for this 
first period. An additional 49 forecasts were obtained for periods 200004 to 201204. 
Let e,, denote the forecast errors from the AR(7) model and e, denote the forecast 
errors from the ARMA[2,(1,7)] model. The mean of e;, is 1.239, the mean of e, is 
1.244, and the estimated variances are var(e,) = 0.797 and var(e,) = 0.780. As such, 
the bias of AR(7) is slightly smaller while the ARMA[2,(1,7)] has the smallest MSPE. 

To ascertain whether these differences are statistically significant, we first check 
the bias. Let the {f}; } series contain the 50 forecasts of the AR(7) model and let {f5,} 
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contain the 50 forecasts from the ARMA[2,(1,7)] model. Beginning with t = 2000Q3, 
we can estimate the two regression equations: 


s, = 0.0594 +0.968f,, and s, = 0.004 + 1.004f,, 


For the AR(7) model, the F-statistic for the restriction that the intercept equals 
zero and the slope equals unity is 0.110 with significance level of 0.896. Clearly, the 
restriction of unbiased forecasts does not appear to be binding. For the ARMA[2,(1,7)] 
model, the F-statistic is 0.014 with a significance level of 0.986. Hence, there is strong 
evidence that both models have unbiased forecasts. 

Next, consider the Granger—Newbold test for equal mean square prediction errors. 
Form the x; and z; series as x; = e4; + e2; and z; = e4; — e;, respectively. The correlation 
coefficient between x; and z; is r,, = 0.234. Given that there are 50 observations in the 
holdback period, form the Granger—Newbold statistic 


r./\/ A — 72)/(A — 1) = 0.234/ y (1 — (0.234)?)/49 = 1.69 


With 49 degrees of freedom, a value of t = 1.69 is not statistically significant. We 
can conclude that the forecasting performance of the AR(7) is not statistically different 
from that of the ARMA[2,(1,7)]. 

Since the e}; and e}; series contain only a low amount of serial correlation, we 
obtain virtually the same answer using the DM statistic. Oftentimes, forecasters are 
concerned about the MSPE. However, there are many other possibilities. In Exercise 12 
at the end of this chapter, you will be asked to use the mean absolute error. Now, to 
illustrate the use of the DM test, suppose that the cost of a forecast error rises extremely 
quickly in the size of the error. In such circumstances, the loss function might be best 
represented by the forecast error raised to the fourth power. Hence, 


d; = (e) -= (ez) (2.63) 


The mean value of the {d;} sequence from (2.63) (i.e., d) is 0.01732, and the esti- 
mated variance is 0.002466. Since H = 50, we can form the DM statistic 


DM = 0.01732/(0.002466/49)!/? = 2.441 


The null hypothesis is that the models have equal forecasting accuracy, and the 
alternative hypothesis is that the forecast errors from the AR[2,(1,7)] are smaller than 
those of the AR(7). With 49 degrees of freedom, the f-value of 2.441 is significant 
at the 1.829% level. Hence, there is evidence in favor of the AR[2,(1,7)] model. If 
there is serial correlation in the {d,} series, we need to use the specification in (2.63). 
Toward this end, we would want to select the statistically significant values of y,. The 
autocorrelations of d, are 


P| P2 P3 P4 P5 P6 P7 Pg P9 P10 P11 P12 
—0.10 —0.15 0.26 0.01 0.36 0.00 —0.09 0.13 0.06 0.05 —0.08 0.07 


Q(4) = 5.53; Q(8) = 14.76; and Q(12) = 15.93 
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Although p; is large, many applied econometricians would dismiss it as spurious. 
It does not seem plausible that correlations for p; and p, are actually very close to zero 
while the correlation between d, and d,_5 is very large. Moreover, the Ljung—Box Q(4), 
Q(8), and Q(12) statistics do not indicate that the autocorrelations are significant. The 
significance levels are 0.237, 0.064, and 0.195, respectively. Nevertheless, if you do 
estimate the long-run variance using (2.63) with five lags, you should find that DM = 
1.848 (so that the MSPEs are not statistically different from each other). The example 
underscores the point made earlier that there is no clear answer as to the best way 
to measure the long-run variance of d in the presence of serial correlation. The more 
general result is that the two models are not substantially different from each other. 
Both should provide reasonable forecasts. 


11. SEASONALITY 


Many economic processes exhibit some form of seasonality. The agricultural, construc- 
tion, and travel sectors have obvious seasonal patterns resulting from their dependence 
on the weather. Similarly, the Thanksgiving-to-Christmas holiday season has a pro- 
nounced influence on the retail trade. In fact, the seasonal variation of a series may 
account for the preponderance of its total variance. Forecasts that ignore important 
seasonal patterns will have a high variance. 

Too many people fall into the trap of ignoring seasonality if they are working 
with deseasonalized or seasonally adjusted data. Suppose you collect a data set that 
the U.S. Census Bureau has “seasonally adjusted” using its X—11, X—12, or X—13 
methods.’ In principle, the seasonally adjusted data should have the seasonal pattern 
removed. However, caution is necessary. Although a standardized procedure may be 
necessary for a government agency reporting hundreds of series, the procedure might 
not be best for an individual wanting to model a single series. Even if you use season- 
ally adjusted data, a seasonal pattern might remain. This is particularly true if you 
do not use the entire span of data; the portion of the data used in your study can 
display more (or less) seasonality than the overall span. There is another important 
reason to be concerned about seasonality when using deseasonalized data. Implicit 
in any method of seasonal adjustment is a two-step procedure. First, the seasonality 
is removed, and second, the autoregressive and moving average coefficients are esti- 
mated using Box—Jenkins techniques. As surveyed in Bell and Hillmer (1984), often 
the seasonal and the ARMA coefficients are best identified and estimated jointly. In 
such circumstances, it is wise to avoid using seasonally adjusted data. 


Models of Seasonal Data 


The Box—Jenkins technique for modeling seasonal data is only a bit different from 
that of nonseasonal data. The twist introduced by seasonal data of period s is that the 
seasonal coefficients of the ACF and PACF appear at lags s, 2s, 3s, ... , rather than at 
lags 1, 2,3, .... For example, two purely seasonal models for quarterly data might be 


Vp =a + Ey, Jay| < 1 (2.64) 
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and 
Yi = Er + P4Er4 (2.65) 


You can easily convince yourself that the theoretical correlogram for (2.64) is such 
that p; = (a,)'/* if i/4 is an integer and p; = 0, otherwise; thus, the ACF exhibits decay 
at lags 4, 8, 12, .... For model (2.65), the ACF exhibits a single spike at lag 4, and all 
other correlations are zero. 

In practice, identification will be complicated by the fact that the seasonal pattern 
will interact with the nonseasonal pattern in the data. The ACF and PACF for a com- 
bined seasonal/nonseasonal process will reflect both elements. Note that, with quarterly 
data, a seasonal MA term can have the form 


Vp = Yi +E, + BE + P4Er4 (2.66) 


Alternatively, an autoregressive coefficient at lag 4 might have been used to capture 
the seasonality 


Vp = QYp-1 + Agyy-4 + Er + PiE 


Both of these methods treat the seasonal coefficients additively; an AR or an MA 
coefficient is added at the seasonal period. Multiplicative seasonality allows for the 
interaction of the ARMA and the seasonal effects. Consider the multiplicative specifi- 
cations 


(1 —a,L)y, = (1+ B,D + BLE, (2.67) 
(1 -aD - aL, = (1 + pD, (2.68) 


Equation (2.67) differs from (2.66) in that it allows the moving average term at 
lag 1 to interact with the seasonal moving average effect at lag 4. In the same way, (2.68) 
allows the autoregressive term at lag | to interact with the seasonal autoregressive effect 
at lag 4. Many researchers prefer the multiplicative form since a rich interaction pattern 
can be captured with a small number of coefficients. Rewrite (2.67) as 


Y, = Ay, 1 tE + PiE F P4Er4 ag By Bae is 


Estimating only three coefficients (i.e., a}, f}, and #4) allows us to capture the 
effects of an autoregressive term and the effects of moving average terms at lags 1, 4, 
and 5. Of course, you do not really get something for nothing. The estimates of the three 
moving average coefficients are interrelated. A researcher estimating the unconstrained 
model y, = ay y,_) + E; + P1E1 + P4Er4 + Bs€,-5 would necessarily obtain a smaller 
residual sum of squares. However, (2.67) is clearly the more parsimonious model. If 
the unconstrained value of p; approximates the product p; p4, the multiplicative model 
will be preferable. For this reason, most software packages have routines capable of 
estimating multiplicative models. Otherwise, there are no theoretical grounds leading 
us to prefer one form of seasonality over another. As illustrated in the last section, 
experimentation and diagnostic checks are probably the best way to obtain the most 
appropriate model. 
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Seasonal Differencing 


The Christmas shopping season is accompanied by an unusually large number of 
transactions, and the Federal Reserve expands the money supply to accommodate the 
increased demand for money. As shown by the dashed line in Figure 2.7, the U.S. 
money supply, as measured by M1, has a decidedly upward trend. The series, called 
MINSA, is contained in the file QUARTERLY.XLS. You can use the data to follow 
along with the discussion below. The logarithmic change, shown by the solid line, 
appears to be stationary. Nevertheless, there is a clear seasonal pattern in that the value 
of the fourth quarter for any year is substantially higher than that for the adjacent 
quarters. 

This combination of strong seasonality and nonstationarity is often found in eco- 
nomic data. The ACF for a process with strong seasonality is similar to that for a 
nonseasonal process; the main difference is that the spikes at lags s, 2s, 3s, ... , do not 
exhibit rapid decay. We know that it is necessary to difference (or take the logarithmic 
change of) a nonstationary process. Similarly, if the autocorrelations at the seasonal 
lags do not decay, it is necessary to take the seasonal difference so that the other auto- 
correlations are not dwarfed by the seasonal effects. The ACF and PACF for the growth 
rate of M1 are shown in Panel (a) of Figure 2.8. For now, just focus on the autocorrela- 
tions at the seasonal lags. All seasonal autocorrelations are large and show no tendency 
to decay. In particular, p4 = 0.58, pg = 0.50, py = 0.38, p16 = 0.34, poy = 0.34, and 
P24 = 0.37. As should be clear from the figure, these autocorrelations are larger than 
any of those at nonseasonal frequencies. 

The first step in the Box—Jenkins method is to transform the data so as to make 
it stationary. As such, a logarithmic transformation is helpful because it can straighten 
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FIGURE 2.7 The Level and Growth Rate of M1 
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Panel a: M1 Growth 
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FIGURE 2.8 ACF and PACF 


the nonlinear trend in M1. Let y, denote the log of M1. As mentioned above, the first 
difference of the {y,} sequence, illustrated by the solid line in Figure 2.7, appears to 
be stationary. However, to remove the strong seasonal persistence in the data, we need 
to take the seasonal difference. For quarterly data, the seasonal difference is y, — y,_4. 
Since the order of differencing is irrelevant, we can form the transformed sequence 


m =(1-L(1-L*)y, 


Thus, we use the seasonal difference of the first difference. The ACF and PACF for 
the {m,} sequence are shown in Panel (b) of Figure 2.8; the properties of this series are 
much more amenable to the Box—Jenkins methodology. The autocorrelation and partial 
autocorrelations for the first few lags are strongly suggestive of an AR(1) process (p; = 
hıı = 0.41, p = 0.16, and 5, = —0.01). Recall that the ACF for an AR(1) process 
will decay and the PACF will cut to zero after lag 1. Given that p4 = —0.42, p = —0.14, 
44 = —0.44, and 55 = 0.28, there is evidence of remaining seasonality in the {m,} 
sequence. The seasonal term is most likely to be in the form of an MA coefficient since 
the autocorrelation cuts to zero, whereas the PACF does not. Nevertheless, it is best 
to estimate several similar models and then select the best. Estimates of the following 
three models are reported in Table 2.5: 


m, = Ag + aM, + E, + PyE;_4 Model 1: AR(1) with Seasonal MA 
m, = a) +(1+a,L)(1 + ayL*)m,_; +£, Model 2: Multiplicative Autoregressive 
m, = ag + (1+ BL + ile, Model 3: Multiplicative Moving Average 


The point estimates of the coefficients all imply stationarity and invertibility. More- 
over, except for the intercepts, all are at least six standard deviations from zero. How- 
ever, the diagnostic statistics all suggest that model 1 is preferred. Model 1 has the best 
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Table 2.5 Three Models of Money Growth 


Model 1 Model 2 Model 3 
a, 0.541 0.496 
(8.59) (7.66) 
ay —0.476 
(-7.28) 
By 0.453 
(6.84) 
By —0.759 —0.751 
(—15.11) (—14.87) 
SSR 0.0177 0.0214 0.0193 
AIC; —735.9; —701.3; —720.1; 
SBC —726.2 —691.7 —710.4 
Q(4) 1.39 (0.845) 3.97 (0.410) 22.19 (0.000) 
Q(8) 6.34 (0.609) 24.21 (0.002) 30.41 (0.000) 
Q(12) 14.34 (0.279) 32.75 (0.001) 42.55 (0.000) 


To ensure comparability, the three models are estimated over the 
196203 — 2008Q2 period. The estimated intercepts are not reported since 
all were insignificantly different from zero. The figures in parentheses fol- 
lowing the Q-statistics are significance levels. 


fit in that it has the lowest SSR, AIC, and SBC. Moreover, the Q-statistics for lags 4, 
8, and 12 indicate that the residual autocorrelations are insignificant. In contrast, the 
residual correlations for model 2 are significant at the long lags [i.e., Q(8) and Q(12) 
are significant at the 0.022 and 0.002 levels]. This is because the multiplicative sea- 
sonal autoregressive (SAR) term does not adequately capture the seasonal pattern. An 
SAR term implies autoregressive decay from period s into period s + 1. In Panel (b) 
of Figure 2.8, the value of p4 is —0.42 but p; is quite small. As such, a multiplicative 
seasonal moving-average (SMA) term might be more appropriate. Model 3 properly 
captures the seasonal pattern, but the MA(1) term does not capture the autoregres- 
sive decay present at the short lags. Other diagnostic methods, including splitting the 
sample, suggest that model | is appropriate. 

The out-of-sample forecasts are shown in Figure 2.9. To create the one- through 
twelve-step-ahead forecasts, model 1 was estimated over the full sample period 
196103-201204. The estimated model is 


m, = 0.545m,_,; + €, — 0.765 €,_4 (2.69) 


Given that m912:4 = —0.00176 and the residual for 2012:1 was 0.00272 (i.e., 
E5912:1 = 0.00272, the forecast of 179)3-, is —0.00304. Now, use this forecast and the 
value of €59}7-2 to forecast mz013:2. You can continue in this fashion so as to obtain the 
out-of-sample forecasts for the {m,} sequence. Although you do not have the residuals 
for periods beyond 2012:4, you can simply use their forecasted values of zero. The 
trick to forecasting future values of M1 from the {m,} sequence is to sum the changes 
and the seasonal changes so as to obtain the logarithm of the forecasted values of 
MI. Since m, = (1 — L)(1 — E”) In(M1,), it follows that the value of In(M1,) can be 
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FIGURE 2.9 Forecasts of M1 


obtained from m, + In(M1,_,) + In(M1,_4) — In(M1,_5). The first 12 of the forecasted 
values are plotted in Figure 2.9. 

The procedures illustrated in this example with highly seasonal data are typical 
of many other series. With highly seasonal data, it is necessary to supplement the 
Box—Jenkins method: 


1. In the identification stage, it is usually necessary to seasonally difference the 
data and to check the ACF of the resultant series. Often, the seasonally differ- 
enced data will not be stationary. In such instances, the data may also need to 
be first differenced. 

2. Use the ACF and PACF to identify potential models. Try to estimate models 
with low-order nonseasonal ARMA coefficients. Consider both additive and 
multiplicative seasonality. Allow the appropriate form of seasonality to be 
determined by the various diagnostic statistics. 


A compact notation has been developed that allows for the efficient representation 
of intricate models. As mentioned in the previous sections, the dth difference of a series 
is denoted by Af., Hence, 


A’y, = AQ, — Yı) 
=y,— 2y,1 + Yj-2 


A seasonal difference is denoted by A, where s is the period of the data. The Dth 
such seasonal difference is A?. For example, if we want the second seasonal difference 
of a monthly series, we can form 


ATY: = Ai: — Y1-12) 
= A12; — Ayoyy-12 
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= yr Y2 O12 — Viva) 
= y, — 2y-12 + Yi-24 
Combining the two types of differencing yields A7A?. Multiplicative models are 
written in the form ARIMA(p, d, g)(P, D, Q); 
where: p and q = the nonseasonal ARMA coefficients 
d = number of nonseasonal differences 
P = number of multiplicative autoregressive coefficients 
D= number of seasonal differences 
Q = number of multiplicative moving-average coefficients 
s = seasonal period. 
Using this notation, we can say that the fitted equation for m, = AA} In(M1,) is an 
ARIMA(1, 1, 0)(0, 1, 1), model. In applied work, the ARIMA(1, 1, 0)(0, 1, 1), and the 


ARIMA(O, 1, 1)(0, 1, 1), models occurs routinely; the latter is called the airline model 
ever since Box and Jenkins (1976) used this model to analyze airline travel data. 


12. PARAMETER INSTABILITY AND 
STRUCTURAL CHANGE 


One key assumption of the Box—Jenkins methodology is that the structure of the 
data-generating process does not change. As such, the values of the a; and p; should 
be constant from one period to the next. However, in some circumstances, there may 
be reasons to suspect a structural break in the data-generating process. For example, 
in a model of GDP growth, it seems natural to inquire whether the oil price shocks 
of 1973, the events surrounding the tragedy of September 2011, and/or the financial 
crisis of 2008 had any significant impacts on the coefficients. Of course, parameter 
instability need not result from a single discrete event. The recent evidence concerning 
climate change suggests that weather sensitive series such as crop yields, rainfall, and 
the number of ski days at Snowmass are most likely to be affected in a sustained, but 
gradual, way. 


Testing for Structural Change 


If you have reason to suspect a structural break at a particular date, it is straightforward 
to use a Chow test. The essence of the Chow test is to fit the same ARMA model the 
prebreak data and to the postbreak data. If the two models are not sufficiently different, 
it can be concluded that there has not been any structural change in the data-generating 
process. 

In general, suppose you estimated an ARMA(p, q) model using a sample size of 
T observations. Denote the sum of the squared residuals as SSR. Also, suppose that 
you have reason to suspect a structural break immediately following date f,,. You can 
perform a Chow test by dividing the T observations into two subsamples with ż,„ obser- 
vations in the first and t, = T — ¢,, observations in the second. Use each subsample to 
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estimate the two models: 


Y, = (1) + a, (Dy, +> +, (Dy, + E, + PDE +--+ + Deg 


using f),..., bm 
Yi = (2) + ay (DY 1 Fo + aD + Ey + Bi (DE py ++ + PDE 
USING yy 1. -ÉT 


Let the sum of the squared residuals from each model be SSR, and SSR,, 
respectively. To test the restriction that all coefficients are equal [i.e., aọ(1) = ag(2) 
and a,(1) = a;(2) and - --a,(1) = a,(2) and f,(1) = f,(2) and - - - 6,(1) = £,(2)], use 
an F-test and form:® 

_ (SSR - SSR, - SSR,)/n 
~ (SSR, + SSR3)/(T — 2n) 


where n = number of parameters estimated (n = p+ q+ 1 if an intercept is included 
and p + q otherwise) and the number of degrees of freedom are (n, T — 2n). 

Intuitively, if the restriction is not binding (i.e., if the coefficients are equal), the 
sum SSR, + SSR, should equal the sum of the squared residuals from the entire sample 
estimation. Hence, F should equal zero. The larger the calculated value of F, the more 
restrictive is the assumption that the coefficients are equal. 

Of course, the method requires that there be a reasonable number of observations 
in each subsample. If either ¢,, or t, is very small, the estimated coefficients will have 
little precision. An alternative type of Chow test is to a use dummy variable to detect a 
break in one of more of the coefficients. For example, if a break is suspected right after 
period f,,, you can create a dummy variable, D,, such that D, = 0 for all ¢ < ¢,, and 
D, = 1 for t > t,,. To test for a break in the intercept of an AR(1) model, for example, 
check for the significance of D, in the regression y, = dy + aD, + a,)y;_; + €;. To 
allow for a break in both coefficients, also create the variable D,y,_; and estimate the 
regression equation y, = dg + «oD, + a) y;_, + @,D,y;_) + £+. You can test for a break 
by examining the individual f-statistics of a and a, and the F-statistic for the null 
hypothesis a = a, = 0. 

Return to the example of the interest rate spread examined in Section 10. Suppose 
that there is reason to believe a break occurred at the end of 198104. Consider the 
estimates for the two subperiods: 


s, = 0.923 + 0.3675,_; + 0.285s, 7 + £, + 0.815¢,_, — 0.153e,_7 
(196003-198104) 


(2.70) 


and 


s, = 1.799 + 0.800s,_, + 0.0535, + £, + 0.354e,_, + 0.097e,_7 
(198201-200801) 


Although the coefficients of the models appear to be dissimilar, we can formally 
test for the equality of coefficients using (2.70). Respectively, the sum of squared 
residuals for the two equations are SSR; = 27.564 and SSR, = 21.414. Estimating 
the model over the full sample period yields SSR = 49.692. Since there are 191 
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usable observations in the sample and n = 5 (the intercept plus the four estimated 
coefficients), (2.70) becomes 


F = [(49.692 — 27.564 — 21.414) /5]/[(27.564 + 21.414)/(191 — 10)] = 0.527. 


With 5 degrees of freedom in the numerator and 181 in the denominator, we can- 
not reject the null of no structural change in the coefficients (i.e., we can accept the 
hypothesis that there is no structural change in the coefficients). 

Alternatively, to test for a break in the intercept only, we can create the dummy 
variable D, equal to zero prior to 1982Q1 and equal to unity beginning in 1982Q1. 
Now, consider the equation for the spread estimated over the entire 196001-200801 
period 


s, = 1.277 + 0.312D, + 0.336s,_, + 0.435s,_, + £, + 0.837e,_; — 0.134e,_, 
(3.55) (0.82) (3.23) (4.43) (13.14)  (=3.33) 


Since D, jumps from 0 to | in 198201, the estimate for the intercept is 1.227 
prior to 1982Q1 and 1.589 (= 1.277 + 0.312) beginning in 198201. However, since 
the t-statistic for the null hypothesis D, = 0 cannot be rejected, there is no evidence of 
a significant intercept break. 


Endogenous Breaks 


The Chow test asks whether there is a break beginning at some particular known break 
date ¢,,. A break occurring at a date not prespecified by the researcher is called an 
endogenous break to denote that the fact that it was not the result of a fixed break date 
such as September 2011. To determine whether there is a break anywhere in the sam- 
ple, you could perform a Chow test for every potential break date f,,,. It should not be 
surprising that the break date that results in the largest value of the F-statistic provides a 
consistent estimate of the actual break date, if any. In order to ensure an adequate num- 
ber of observations in each of the two subsamples, it is necessary to have a “trimming” 
such that the break could not occur before the first fg observations or after the last T — tọ 
observations. In applied research, it is common to use a trimming value of 10% so that 
there are at least 10% of the observations in each of the two subsamples. In the interest 
rate spread example, there are 191 usable observations in the 1960Q1—2008Q1 period 
(since the first two are lost when estimating the coefficient for s,_,). If you used a 10% 
trimming, you could check for a break everywhere in the interval 196501-200302 
(each about 19 observations from the beginning and end of the usable data). Unfortu- 
nately, searching for the most likely break date means that the F-statistic for the null 
hypothesis of no break is inflated. After all, you have just searched for the date that leads 
to the maximum, or supremum, value of the sample F-statistic. As such, the distribu- 
tion for the F-statistic is not standard and cannot be obtained from a traditional F-table. 
As detailed in Chapter 7, a number of papers, including Andrews and Ploberger (1994) 
and Hansen (1997), show how to obtain the appropriate critical values. Fortunately, a 
number of software packages can readily perform such tests. 
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Parameter Instability 


Notice that the Chow test and its variants require the researcher to specify a particular 
break date and to assume that the break fully manifests itself at that date. The inter- 
cept, for example, is aọ(1) up to ¢,, and is precisely a)(2) beginning at f,,,, ,. However, 
the assumption that a break occurs exactly at a single point in time may not always be 
appropriate. As mentioned above, there is no particular date at which we can say that 
significant climate change has occurred. Similarly, it is not clear how we can provide a 
specific break date to denote the advent “financial deregulation” in the asset markets or 
to assign a specific date to the development of the microcomputer. These are processes 
that have been evolving over time. Even if we could date the precise start of financial 
deregulation or the computer revolution, the full effects of these changes would not 
occur instantly. As such, it should not be surprising that a number of procedures have 
been developed that check for parameter stability without the need to identify a partic- 
ular break date. Probably, the simplest method is to estimate the model recursively. For 
example, if you have 150 observations, you can estimate the model using only the first 
few, say 10, observations. Plot the individual coefficients and then reestimate the model 
using the first 11 observations. You can keep repeating this process until you use all 150 
observations. In general, the plots of the coefficients will not be flat since the prelimi- 
nary values are estimated using a very small number of observations. However, after a 
“burn-in” period, the time plots of the individual coefficients can provide evidence of 
coefficient stability. If the magnitude of a coefficient suddenly begins to change, you 
should suspect a structural change at that point. A sustained change in a coefficient 
might indicate a model misspecification. One particularly helpful modification of this 
procedure is to plot each coefficient along with its estimated +2 standard deviation 
band. The bands represent confidence intervals for the estimated coefficients. In this 
way, it can be seen if the coefficients are always statistically significant and whether 
the coefficients in the early periods appear to be statistically different from those of the 
latter periods. 

At each step along the way, it is also possible to create the one-step-ahead fore- 
cast error. Let e,(1) be the one-step-ahead forecast error made using all observations 
through ¢. In other words, e,(1) is the difference between y,,, and your conditional 
forecast of y,,, (ie., E,y,,,). If you start with the first 10 observations, the value of 
€;o(1) will be y4; — E,9y,, and the value of e,49(1) will be y15ọ — Ey49¥159- [Note: If 
you understand the notation, it should be clear that you cannot create the value e159(1) 
since you do not have the value of y,5,.] If your model fits the data well, the forecasts 
should be unbiased so that the sum of these forecast errors should not be “too far” from 
zero. In fact, Brown, Durbin, and Evans (1975) calculate whether the cumulated sum of 
the forecast errors is statistically different from zero. To be a bit more formal, define: 


N 
CUSUMy = ie(D/o, N=n,...,T-1 


i=n 
where n denotes the date of the first forecast error you constructed, T denotes the date 
of the last observation in the data set, and o, is the estimated standard deviation of 
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the forecast errors. With 150 total observations (T = 150), if you start the procedure 
using the first 10 observations (n = 10), 140 forecast errors (T — n) can be created. 
Note that o, is created using all T — n forecast errors. If n = 10, to create CUSUM 9, 
use the first 10 observations to create the one-step-ahead forecast error and construct 
€i9(1)/o,. Now let N = 11 and create CUSUM,, as [e;9(1) + e;;(1)]/o,. Similarly, 
CUSUM,_, = [e;p(1) +: -+ + e7_)(1)]/o,. If you use the 5% significance level, the 
plot value of each value of CUSUM,, should be within a band of approximately 
+ 0.948 [(T — n)°> + XN — n)(T — n)~°]. 


An Example of a Break 


In order to illustrate a breaking series, the first panel of Figure 2.10 shows 150 observa- 
tions of the simulated series y, = 1 + 0.5y,_; + €, fort < 101 andy, = 2.5 + 0.65y,_, + 
€, fort > 101. The series is contained in the file Y_BREAK.XLS. Of course, in applied 
work, the break may not be so readily apparent. If you ignore the break and estimate 
the entire series as an AR(1) process, you should obtain 


y, = 0.4442 + 0.8822y,_, 
(2.635) (22.764) 


As indicated by the remaining two panels of the figure, the estimated AR(1) model 
is seriously misspecified. Panel 2 shows the estimates of the AR(1) coefficient (along 
with their +2 standard deviation bands) resulting from a recursive estimation. The ini- 
tial confidence intervals are quite wide since the first few estimations use a very small 
number of observations. The estimates all seem reasonable until about tf = 100. At this 
point, the estimates of the AR(1) coefficient rise (the reverse of what happens in the 
data-generating process). Note that the confidence bands do not even overlap those 
from the middle periods. The clear suggestion is that there has been a significant struc- 
tural change. The CUSUMs, shown in Panel (3), are clearly within the 90% confidence 
interval for t < 101. At this point, they begin to drift upward and depart from the band 
at t = 125. As such, the hypothesis of coefficient stability can be rejected. 

Notice that the CUSUMs do not actually depart from the band until late in the 
sample. This is indicative of the problem that the CUSUM test may not detect coeffi- 
cient instability occurring late in the sample period. Moreover, the test may not have 
much power if there are multiple changes with little overall effect on the CUSUMs. 
Nevertheless, the test is a useful diagnostic tool that does not require the researcher 
to stipulate the nature of the model’s misspecification. It is able to detect model mis- 
specifications from such varied sources including smooth structural breaks, multiple 
breaks, neglected nonlinearities in the data-generating process, or an overly parsimo- 
nious model. A variant of the test, often called CUSUM(2), is to form the CUSUMs 
using the squared errors. The use of the squared errors can help detect changes in the 
variance. 

If you had strong reason to believe that the break occurred in period 101, you could 
form a dummy variable D, = 0 from ¢ = | to 100 and D, = 1 thereafter. To check for 
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FIGURE 2.10 Recursive Estimation of the Model 
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an intercept break, estimate 


y, = 0.9254 + 0.5683y,_, + 1.936D, 
(5.36) (8.91) (5.88) 


Since the coefficient for D, is highly significant, you can conclude that there was 
a break in the intercept. To check for a break in the intercept and slope coefficient, also 
form the variable D,y,_, and estimate: 


y, = 1.6015 + 0.2545y,_, — 0.2244D, + 0.5433D,y,_, 
(7.22) (2.76) (-0.391) (4.47) 


In this particular case, the dummy variables indicate that there is a break but do 
not measure the size of the break very well (Note: The actual break in the intercept 
is +1.5 and the actual break in the AR(1) coefficient is 0.15.) The coefficient for the 
intercept break is not significant while the break in the slope coefficient is highly sig- 
nificant. The F-statistic for the joint hypothesis that the coefficients on D, and D,y,—1 
are equal to zero is 29.568. With 2 degrees of freedom in the numerator and 145 in the 
denominator, this value is significant at any conventional level. The important point 
is that you can conclude that the simple AR(1) model is misspecified because of a 
structural break. 

If you wanted to estimate the most likely value for t, you could repeat the estima- 
tion for every time period in the interval 15 < ¢,, < 135. The values of the F-statistics 
from each recursive estimation are shown in Figure 2.11. Notice that the F-values are 
largest for t, = 100. Although this consistent estimate of the break date turns out to 
be exactly correct, you should expect a discrepancy when using actual data. Also note 
that the F-test (and the ¢-statistics of the individual coefficients) for the null hypothesis 
of no structural change can be tested using Hansen’s (1997) method (see Chapter 7). 
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FIGURE 2.11 Recursive F-tests 
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13. COMBINING FORECASTS 


What should you do if you have several plausible models and want to use them to fore- 
cast? For example, in Section 10, it turned out that there are several plausible models 
of the interest rate spread. It makes little sense to forecast only with the “best” model 
and discard the others. After all, the other models may capture some information that 
is not contained in the others. The natural answer is to forecast with all of the plausible 
models and then take the average of the forecasts. 

In turns out that the intuitive notion of using the average, or composite, forecast is 
quite reasonable. Bates and Granger (1969) were among the first to confirm the intuition 
that a weighted average of forecasts can be quite beneficial. Let the series f,, contain 


the one-step-ahead forecasts of y, from model i (i = 1, 2, ... , n). Consider the composite 
forecast f., constructed as weighted average of the individual forecasts 
Sor = Wifi + Wafat +++ + Watt (2.71) 


where w; are weights such that ))"_, w; = 1. 
If the forecasts are unbiased (so that F,_,f;, = y+), it follows that the composite 
forecast is also unbiased: 


r 
i 


Ey her = WE hit + WEY hy +++ + Wr Ett 
= WY, + Woy, toe Fb Wry, = Vy 


To keep the notation simple, return to the case in which n = 2. Subtract y, from 
each side of (2.71) to obtain 


Jet — Yi = Wir — Yd + A — wy) far — Ye) 


Now let e4, and e,, denote the series containing the one-step-ahead forecast errors 
from models | and 2 (i.e., e;, = y, —fj,) and let e,, be the composite forecast error. As 
such, we can write 

Coy = Wey, + (1 — Wy ery 


The variance of the composite forecast error is 
var(e,,) = wy var(e,,) + (1 — w,)’var(ey,) + 2w, (1 — w, cov(e;,,) (2.72) 


At this point, you should be able to see the potential benefits of combining such 
forecasts. To take a simple example, suppose that the forecast error variances are the 
same size and that cov(e;,€2;) = 0. If you take a simple average by setting w, = 0.5, 
(2.72) indicates that the variance of the composite forecast is 25% of the variances of 
either forecast: var(e,,) = 0.25var(e,,) = 0.25var(e,,). 


Optimal Weights 


Although simple averaging can work to reduce the forecast error variance, it is possible 
to find the optimal weights. If we use (2.72) and select the weight w, so as to minimize 
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var(é,.,): 


ô var(e,,) 
ye 2w, var(e;,) — 2(1 — w) var(es,) + 211 — 2w,) cov(e;,e>,) 
Wi 


The optimal value of w, (called wi) is 


: var(e>,) — COV (e1,€2) 


wi (2.73) 


~ var(e,,) + var(ey,) — 2 cov(e jez) 


and if cov(e),é7,) = 0, w; can be written as 


P var(€>,) var(e,,)~! 


iS = M 
1 var(e,,) + var(ey,) — var(e,,)~! + varles)! 


Hence, if the covariance is zero, the optimal weight is inversely proportional to 
its variance. As var(e,,) gets relatively small, the weight attached to f}, goes to unity, 
and as var(e,,) gets relatively large, the weight attached to f, goes to zero. Since the 
actual forecast error variances are not known, in practice, they replaced by the estimated 
forecast error variances from the type of out-of-sample forecast exercises conducted 
above. In a setting with a large number of competing forecasting models, constructing 
optimal weights as in (2.73) can be quite tedious. Moreover, estimates of the covariance 
terms are often poor. As such, a number of researchers, including Bates and Granger 
(1969), recommend constructing the weights excluding the covariance terms. Hence, 
in the n-variable case, the weights can be constructed as 

var(e,,)7! 


wi = (2.74) 
var(e,,)~! + var(ey,)~! + +++ + var(e,,)7! 


It is straightforward to compute the reciprocals of forecast error variance from 
each model and then to normalize each by the sum across all of the models. Granger 
and Ramanathan (1984) show that an equivalent method for constructing the weights 
is to use a regression model. Consider the regression equation 


Yi = Ay + Afir + Agfa, +--+ Of +V (2.75) 


Of course, it would be possible to force a = 0 and a; + ay +---+a, = 1. Under 
these conditions, the a;’s would have the direct interpretation of optimal weights so 
that w* could be set equal to a;. However, Granger and Ramanathan recommend the 
inclusion of an intercept to account for any bias and to leave the a;’s unconstrained. As 
surveyed in Clemen (1989), not all researchers agree with the Granger—Ramanathan 
recommendation and a substantial amount of work has been conducted so as to obtain 
optimal weights. 

There are two important differences between (2.74) and (2.75). In (2.75), one 
or more of the estimated weights may be negative. In such circumstances, most 
researchers would reestimate the regression without the forecast associated with 
the most negative coefficient. Moreover, in (2.75), the {v,} sequence can be serially 
correlated. In such circumstances, Diebold (1988) recommends using lagged values 
of y, and/or moving average terms to capture the serial correlation. 
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It is also possible to use the SBC as a weighting factor. Now the weights are deter- 
mined by in-sample fit instead of out-of-sample forecasts. Let SBC; be the SBC from 
model i and let SBC* be the SBC from the best fitting model. You can easily form 
a; = exp[(SBC* — SBC,)/2] and then construct the weights 


n 
* — 
We a/ >, ai 
ml 


Since exp(0) = 1, the model with the best fit has the weight 1/Za;. Since a; is 
decreasing in the value of SBC;, models with a poor fit have smaller weights than mod- 
els with large values of the SBC. 


Example Using the Spread 


In Section 10, we examined seven different ARMA models of the interest rate spread. 
Given that the data ends in April 2012, if you were to use each of the seven models to 
make a one-step-ahead forecast for January 2013, you should find 

AR(7) AR(6) AR(2) AR(I1,2,7II) ARMA(I,1) ARMA(2, 1) ARMA(2,II1,7II) 
fai: 9-775 0.775 0.709 0.687 0.729 0.725 0.799 


Simple averaging of the individual forecasts (i.e., setting all weights equal to 1/7) 
results in a combined forecast of 0.743. Now, use the methodology discussed in 
Section 10 to construct 50 one-step-ahead out-of-sample forecasts for each of the 
seven models so as to obtain the f, series. After constructing the e,, sequences as 
Fi, —y;, it is trivial to find the seven values of var(e;,). If you use (2.74), you should 
find that the forecast error variances and the associated weights are 


AR(7) AR(6) AR(2) AR(I1,2,7II) ARMA(I, 1) ARMA(2, 1) ARMA(2,II1,7Il) 


var(e;,,) 0.635 0.618 0.583 0.587 0.582 0.600 0.606 
w; 0.135 0.139 0.147 0.146 0.148 0.143 0.141 


The weights are very similar because the forecast error variances are alike. Weight- 
ing the individual forecasts yields the composite forecast f.20913:1 = 0.741. 

Next, use the spread (s,) to estimate a regression in the form of (2.75). If you omit 
the intercept and constrain the weights to unity, you should obtain 


s, = 0.55f; — 0.25f; — 2.3Tfa, + 2.44fy, + 0.84f5, — 0.28f5, + 1.17f (2.76) 


Although some researchers would include the negative weights in (2.76), most 
would eliminate those that are negative. If you successively reestimate the model by 
eliminating the forecast with the most negative coefficient, you should obtain 


s, = 0.326f, + 0.170fs, + 0.504fy, 


All of the coefficients are positive and the residuals do not show any sign of serial 
correlation. As such, it is reasonable to use the weights 0.326, 0.170, and 0.504 for 
the forecasts from the AR(||1,2,7||), ARMA(1, 1), and ARMA(2, ||1,7||), models, 
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respectively. The composite forecast using the regression method is 0.326(0.687) + 
0.170(0.729) + 0.504(0.799) = 0.751. 

Finally, if you use the values of the SBC as weights, you should obtain 

AR(7) AR(6) AR(2) AR(ll1,2,71) ARMA, 1) ARMA(2, 1) ARMA(2,II1,7I) 
w; 0.000 0.000 0.011 0.001 0.112 0.103 0.773 


The composite forecast using SBC weights is 0.782. In actuality, the spread in 
2013:1 turned out to be 0.74 (the actual data contains only two decimal places). Of 
the four methods, simple averaging and weighting by the forecast error variances did 
quite well. In this instance, the regression method and constructing the weights using 
the SBC provided the worst composite forecasts. 


14. SUMMARY AND CONCLUSIONS 


The chapter focuses on the Box—Jenkins (1976) approach to identification, estimation, 
diagnostic checking, and forecasting a univariate time series. ARMA models can be 
viewed as a special class of linear stochastic difference equations. By definition, an 
ARMA model is covariance stationary in that it has a finite and time-invariant mean 
and covariances. For an ARMA model to be stationary, the characteristic roots of the 
difference equation must lie inside the unit circle. Moreover, the process must have 
started infinitely far in the past or the process must always be in equilibrium. 

In the identification stage, the series is plotted, and the sample autocorrelations and 
partial correlations are examined. A slowly decaying autocorrelation function suggests 
nonstationarity behavior. In such circumstances, Box and Jenkins recommend differ- 
encing the data. Formal tests for nonstationarity are presented in Chapter 4. A common 
practice is to use a logarithmic or Box—Cox transformation if the variance does not 
appear to be constant. Chapter 3 presents some modern techniques that can be used to 
model the variance. 

The sample autocorrelations and partial correlations of the suitably transformed 
data are compared to those of various theoretical ARMA processes. All plausible mod- 
els are estimated and compared using a battery of diagnostic criteria. A well-estimated 
model (i) is parsimonious, (ii) has coefficients that imply stationarity and invertibility, 
(iii) fits the data well, (iv) has residuals that approximate a white-noise process, (v) has 
coefficients that do not change over the sample period, and (vi) has good out-of-sample 
forecasts. 

A useful check for coefficient instability involves recursive estimation techniques. 
A sudden change in the recursive estimates of one or more coefficients is indicative of a 
structural break. The Chow test can be used to test for a break at a known date and more 
gradual changes can be detected by recursive estimation or by a CUSUM test. As dis- 
cussed in Chapter 7, the Andrews and Ploberger (1994) test can detect an endogenous 
break. Bai and Perron (1998, 2003) show how to test for multiple endogenous breaks. 

In utilizing the Box—Jenkins methodology, you will find yourself making many 
seemingly ad hoc choices. The most parsimonious model may not have the best fit 
but may have the best out-of-sample forecasts. You will find yourself addressing the 
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following types of questions: What is the most appropriate data transformation? Is an 
ARMA(2, 1) model more appropriate than an ARMA(1, 2) specification? How can sea- 
sonality best be modeled? What should be done about seemingly significant coefficients 
at reasonably long lags? Given this latitude, many view the Box—Jenkins methodology 
as an art rather than a science. Nevertheless, the technique is best learned through expe- 
rience. The exercises at the end of this chapter are designed to guide you through the 
types of choices you will encounter in your own research. 


QUESTIONS AND EXERCISES 


1. 


In the coin-tossing example of Section 1, your average winnings on the last four tosses (w,) 
can be denoted by 
w, = 1/4e, + 1/4e€,_, + 1/4e,_. + 1/4 €,, 


a. Find the expected value of your winnings. Find the expected value given that €,_, = 
Eo = 1. 
b. Find var(w,). Find var(w,) conditional on €,_, = €,_, = 1. 


c. Find cov(w,, w,_,), COV(W,, W,_»), and cov(w,, W,_5). 


. Consider the second-order autoregressive process y, = dy + 43y, + €, where |a,| < 1. 


a. Find: 
i. E, ii, E_Y, iii. E,y,,5 iv. cov(y,,y,_1) 
Vv. COV(),,Y,_2) vi. the partial autocorrelations ¢,, and ¢,). 


b. Find the impulse response function. Given y,_,, trace out the effects of an £, shock on the 
{y,} sequence. 

c. Determine the forecast function: E,y,,,. The forecast error e,(s) is the difference between 
Yas and E,y,,,. Derive the correlogram of the {e,(s)} sequence. { Hint: Find E,e,(s), 
var[e,(s)], and E.[e,(s)e,(s — j)] for j = 0 to s}. 


. Substitute (2.10) into y, = ay + a,y,_, + €,. Show that the resulting equation is an identity. 


a. Find the homogeneous solution to y, = dy) + a,y,_, + £,- 
b. Find the particular solution given that |a,| < 1. 
c. Show how to obtain (2.10) by combining the homogeneous and particular solutions. 


. The general solution to an nth-order difference equation requires n arbitrary constants. Con- 


sider the second-order equation y, = a) + 0.75y,_, — 0.125y,_, + €,. 

a. Find the homogeneous and particular solutions. Discuss the shape of the impulse 
response function. 

b. Find the values of the initial conditions that ensure the {y,} sequence is stationary. 


c. Given your answer to part (b), derive the correlogram for the {y,} sequence. 


t-1 


. Consider the second-order stochastic difference equation: y, = 1.5y,_, — 0.Sy,_, + €,- 


a. Find the characteristic roots of the homogeneous equation. 

b. Demonstrate that the roots of 1 — 1.5L + 0.5L’ are the reciprocals of your answer in 
part a. 

c. Given initial conditions for y, and y,, find the solution for y, in terms of the current and 
past values of the {€,} sequence. 

d. Find the forecast function for y,,, (i.e., find the solution for the values of y;.,, given the 
values of y, and y,_,). 

e. Find Ey,, Ey,,,, var(y,), var(y,,,), and cov(y,,;,),)- 


114  CHAPTER2 STATIONARY TIME-SERIES MODELS 


6. There are often several representations for the identical time-series process. In the text, the 
standard equation for an AR(1) model is given by y, = dy + a,y,_, + €;- 
a. Show that equivalent representations are i. (y, — Y) = a,(y,_; — Y) + £, where y is the 
unconditional mean of the {y,} series and ii. y,=a)/(1 — a,) + u, where u, =a, M, +E, 
b. In Chapter 1, we considered several models with a deterministic time trend. For 
example, a modified version of equation (1.62) is y, = dy) + a,y,_, + a,f + €, where 
|a,| < 1. Explain why the y, sequence is not stationary. Also, explain why the y, 
sequence is stationary about the trend line a, + a,t. What does it mean to say that the y, 
sequence is trend stationary? 
c. Verify that the process generated by y, = 16.2 + 0.2t + u, where u, = 0.95y,,_, + £, is 
identical to the process generated by y, = 1 + 0.95y,_, + 0.01t + €,. 
d. Show that the first-order trend-stationary process y, = dy + a,y,_, + a,t + £, where 
|a,| < 1 can be written in the form y, = cy + c,f + y, Where 4, = CyH,_, + €,. Also, use 
the method of undetermined coefficients to find the values of cy, c,, and c,. 
7. As you read more of the time-series literature, you will find that different authors and dif- 
ferent software packages report the AIC and the SBC in various ways. The purpose of 
this exercise is to show that, regardless of the method you use, you will always select the 
same model. The examples in the text use AIC = T In(SSR) + 2n and SBC = T In(SSR) + 
n In(T) where SSR = sum of squared residuals. However, other common formulas include 


AIC* = —2 In(L)/T+2n/T and SBC* = —2 In(L)/T +n In(T)/T 
and 
AIC’ = exp(2n/T)-SSR/T SBC’ = ql. SSR/T 


where SSR = sum of squared residuals, In(L) = maximized value of the log of the likeli- 

hood function = —(T /2) In(2z) — (T/2) In(o”) — (1/207) (SSR), and o? = variance of the 

residuals. 

a. Jennifer estimates two different models over the same time period and assesses their fit 
using the formula AIC* = —2 In(L)/T + 2n/T. She denotes the two values AIC*(1) and 
AIC*(2) and finds that AIC*(1) < AIC*(2). Justin estimates the same two models over 
the same time period but assesses the fit using the formula AIC = T In(SSR) + 2n. Show 
that Justin’s results must be such that AIC(1) < AIC(2). 

Hint: Since AIC*(1) < AIC*(2), it must be the case that In(2z) + In(o,7) + 

T (1/o,7) (SSR,) + 2n,/T < In(2z) + In(o,) + T (1/057) (SSR,) + 2n,/T. 

where n;, SSR,, and o? are the number of parameters, the sum of squared residuals, and 
the residual variance of model i, respectively. Recall that the estimate of o° is SSR/T. 
If you simplify the inequality relationship, you should find that it is equivalent to T 
In(SSR,) + 2n, < T In(SSR,) + 2n,. 

b. Show that all the three methods of calculating the SBC will necessarily select the same 
model. 

c. Select one of the three pairs above. Show that the AIC will never select a more parsimo- 
nious model than the SBC. 

8. The file entitled SIM_2.XLS contains the simulated data sets used in this chapter. The 
first series, denoted Y1, contains the 100 values of the simulated AR(1) process used in 
Section 7. Use this series to perform the following tasks (Note: Due to differences in data 
handling and rounding, your answers need only approximate those presented here.) 


a. Plot the sequence against time. Does the series appear to be stationary? 
b. Use the data to verify the results given in Table 2.2. 


10. 


11. 
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c. Estimate the series as an AR(2) process without an intercept. Letting e, denote the resid- 
ual (which may be serially correlated), you should obtain 


y, = 0.701ly,_, + 0.105y,. + e, usable observations: 98 
(7.01) (1.047) 


Ljung—Box Q-statistics: Q(8) = 5.13. Q(16) = 15.86. Q(24) = 21.02 
d. Estimate the series as an ARMA(1, 1) process without an intercept. You should obtain 


y, = 0.844y, , — 0.144e,_, 
(12.16) (-1.12) 


+ e; usable observations: 99 


Verify that the ACF and PACF of the residuals do not indicate any serial correlation. 


. The second column in file SIM_2.XLS contains the 100 values of the simulated 


ARMA(1, 1) process used in Section 7. This series is entitled Y2. Use this series to perform 
the following tasks (Note: Due to differences in data handling and rounding, your answers 
need only approximate those presented here.): 

a. Plot the sequence against time. Does the series appear to be stationary? Plot the ACF. 

b. Verify the results in Table 2.3. 

c. Estimate the process using a pure MA(2) model. You should obtain 


y, = —1.15e,_; + 0.522€,, + e, usable observations: 100 
(—13.22) (5.98) 


Verify that the Ljung—Box Q-Statistics are Q(8) = 28.48, Q(16) = 37.47, and Q(24) = 
38.84 with significance levels of 0.000, 0.000, and 0.015, respectively. 
d. Compare the MA(2) to the ARMA(1, 1). 
The third column in file SIM_2.XLS contains the 100 values of the simulated AR(2) process 
used in Section 7. This series is entitled Y3. Use this series to perform the following tasks 
(Note: Due to differences in data handling and rounding, your answers need only approxi- 
mate those presented here.): 
a. Plot the sequence against time. Verify the ACF and the PACF coefficients reported in 
Section 7. Compare the sample ACF and PACF to those of a theoretical AR(2) process. 
b. Estimate the series as an AR(1) process. You should find that the estimated AR(1) coef- 
ficient and the t-statistic are 


y, =0.467y,_,; +e, 
(5.24) 


Show that the standard diagnostic checks indicate that this AR(1) model is inadequate. 
Be sure to perform a recursive estimation of the AR(1) model and to plot the CUSUMs. 

c. Could an ARMA(1, 1) process generate the type of sample ACF and PACF found in 
part a? Estimate the series as an ARMA(1, 1) process. You should obtain 


y, = 0.183y,_, + 0.510e€,_, + e, usable observations: 99 
(1.15) (3.64) 


Use the Ljung—Box Q-statistics to show that the ARMA(1, 1) model is inadequate. 
d. Estimate the series as an AR(2) process to verify the results reported in the text. 


If you have not already done so, download the Programming Manual that accompanies this 

text and the data set QUARTERLY.XLS. 

a. Section 2.7 examines the price of finished goods as measured by the PPI. Form the 
logarithmic change in the PPI as dly, = log(ppi,) — log(ppi,_,). Verify that an AR(||1, 31) 
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12. 


13. 
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model of the dly, series has a better in-sample fit than an AR(3) or an ARMA(J, 1) 
specification. 

b. How does the out-of-sample fit of the AR(|| 1, 3]|) compare to that of the ARMA(1, 1)? 

c. What is the problem in comparing the out-of-sample fit of the AR(||1, 3]]) to that of the 
AR(3)? 

d. Experiment with an AR(5) and an ARMA(2, 1) model (see Exercise 2.1 on page 32 of 
the manual) to see how they compare to the AR(||1, 31). 


Section 2.9 of the Programming Manual that accompanies considers several seasonal mod- 

els of the variable Currency (Curr) on the data set QUARTERLY.XLS. 

a. First-difference the log of curr, and obtain the ACF and PACF of the resultant series. 
Does the seasonal pattern best reflect an AR, MA, or a mixed pattern? Why is there a 
problem in estimating the first difference using the Box—Jenkins methodology? 

b. Now, obtain the ACF and PACF of the seasonal difference of the first difference. What is 
likely the pattern present in the ACF and PACF? 

c. Although the manual indicates that the ARMA(1, 1, 0)(0, 1, 1) has the best in-sample fit, 
prepare a careful comparison of this model with an ARMA(O, 1, 1)(0, 1, 1) specification. 

The file QUARTERLY.XLS contains a number of series including the U.S. index of indus- 

trial production (indprod), unemployment rate (urate), and producer price index for finished 

goods (finished). All of the series run from 1960Q1 to 201204. 


a. Exercises with indprod. 


i. 


ii. 


Construct the growth rate of the series as y, = log(indprod,) — log(indprod,_,). Since 
the first few autocorrelations suggest an AR(1), estimate y, = 0.0028 + 0.600y,_, + £, 
(the f-statistics are 2.96 and 10.95, respectively). 

Show that adding an AR term at lag 8 improves the fit and removes some of the serial 
correlation. What concerns do you have about simply adding an AR(||8||) term to the 
industrial production series? 


b. Exercises with urate. 


i. 


ii. 


iii. 


= 


iv. 


Graph the time path and the ACF of the series. Do you have any concerns that the 
series may not be covariance stationary with normally distributed errors? 
Temporarily ignore the issue of differencing the series. Estimate urate as an 
AR(2) process including an intercept. You should find y, = 0.226 + 1.65y,_, — 
0.683y,_, + €,. 

Find the characteristic roots of the deterministic part of the difference equation and 
discuss the nature of the implied adjustment process. 

Compare the model of ii to that obtained by estimating the first difference of the 
series as an AR(1) process. 


c. Exercises with cpicore. 


i. 


ii. 


iii. 
iv. 


It is not very often that we need to second difference a series. However, construct the 
inflation rate as measured by the core CPI as dly, = log(cpicore,) — log(cpicore,_,). 
Form the ACF and PACF of the series any indicate why a Box—Jenkins modeler 
might want to work with the second difference of the logarithm of the core CPI. 
Let d2ly, denote the second difference of the dly, series. Find the best model of the 
d2ly, series. In particular, show that an MA(1) model fits the data better than an 
AR(1). 

Does the MA(1) or the AR(1) has better forecasting properties? 

Estimate the dly, series as an AR(2) process. Beginning with 2013:1, use your 
answer to obtain the 1-step- through 12-step-ahead forecasts of the cpicore, 
series. Compare these to the forecasts of cpicore, from the d2ly, estimated as an 


QUESTIONS AND EXERCISES 117 


MA(1). (Note: You will need to transform your forecasts to the forecasts of the 
cpicore,.) 

14. The file QUARTERLY.XLS contains U.S. interest rate data from 1960Q1 to 201204. As 
indicated in Section 10, form the spread by subtracting the T-bill rate from the 5-year rate. 
a. Use the full sample period to obtain estimates of the AR(7) and the ARMA(1, 1) model 

reported in Section 10. 

b. Estimate the AR(7) and ARMA(1, 1) models over the period 196001-200003. 
Obtain the one-step-ahead forecast and the one-step-ahead forecast error from each. 
As in Section 10, continue to update the estimation period so as to obtain the 50 
one-step-ahead forecast errors from each model. Let f,, denote the forecasts from the 
AR(7) and f,, denote the forecasts from the ARMA(1, 1). You should find that the 
properties of the forecasts are such that 


Yoo009341 = 9.0536 + 0.968f;, and Yooo09341 = —0.005 + 1.000f;,. 


Are the forecasts unbiased? 

c. Construct the Diebold—Mariano test using the mean absolute error. How do the results 
compare to those reported in Section 10. 

d. Use the Granger—Newbold test to compare the AR(7) model to the ARMA(1, 1). 

e. Construct the ACF and PACF of the first difference of the spread. What type of model is 
suggested? 

f. Show that a model with 2 AR lags and MA lags at 3 and 8 has a better fit than any of the 
models reported in the text. What do you think about such a model? 


15. The file QUARTERLY.XLS contains the U.S. money supply as measured by M1 (MINSA) 
and as measured by M2 (M2NSA). The series are quarterly averages over the period 1960:1 
to 201204. 

a. Reproduce the results for M1 that are reported in Section 11 of the text. 

b. How do the three models of M1 reported in the text compare to a model with a seasonal 
AR(1) term with an additive MA(1) term? 

c. Obtain the ACF for the growth rate of the M2NSA series. What type of model is sug- 
gested by the ACF? 

d. Denote the seasonally differenced growth rate of M2NSA by m2,. Estimate an AR(1) 
model with a seasonal MA term over the 1962:3 to 2014:4 period. You should obtain 
m2, = 0.5412m2,_, + £, — 0.8682e,_,. Show that this model is preferable to (i) an AR(1) 
with a seasonal AR term, (ii) MA(1) with a seasonal AR term, and (iii) an MA(1) with a 
seasonal MA term. 

e. Would you recommend including an MA term at lag 2 to remove any remaining serial 
correlation in the residuals? 


16. The file labeled Y_BREAK.XLS contains the 150 observations of the series constructed as 
y, = 1+0.5y,_; + (1 + 0.1y,_,)D, + €, where D, is a dummy variable equal to 0 for t < 101 
and equal to 1.5 fort > 101. 

a. Explain how this representation of the model allows the intercept to jump from 1 to 2.5 
and the AR(1) coefficient to jump from 0.5 to 0.65. 

b. Use the data to verify the results reported in the text. 

c. Why do you think that the estimated intercept actually falls beginning with period 101? 

d. Estimate the series as an AR(2) process. In what sense does the AR(2) model perform 
better than the AR(1) model estimated in part a? 

e. Perform a recursive estimation of the AR(2) model and plot the CUSUMs. Is the AR(2) 
model adequate? 


CHAPTER 3 


MODELING VOLATILITY 


Learning Objectives 


1. 


12. 


Examine the so-called stylized facts concerning the properties of economic 
time-series data. 


Introduce the basic ARCH and GARCH models. 


Show how ARCH and GARCH models have been used to estimate inflation 
rate volatility. 


Illustrate how GARCH models can capture the volatility of oil prices, real 
U.S. GDP, and the interest rate spread. 


Show how a GARCH model can be used to estimate risk in a particular 
sector of the economy. 


Explain how to estimate a time-varying risk premium using the ARCH-M 
model. 


Explore the properties of the GARCH(1, 1) model and forecasts from 
GARCH models. 


Derive the maximum likelihood function for a GARCH process. 

Explain several other important forms of GARCH models including 
IGARCH, asymmetric TARCH, and EGARCH models. 

Illustrate the process of estimating a GARCH model using the NYSE 100 
Index. 

Show how multivariate GARCH models can be used to capture volatility 
spillovers. 

Develop volatility impulse response functions and illustrate the estimation 
technique using exchange rate data. 


Many economic time series do not have a constant mean, and most exhibit phases of 
relative tranquility followed by periods of high volatility. Much of the current econo- 
metric research is concerned with extending the Box—Jenkins methodology to analyze 
these types of time-series variables. 


1. ECONOMIC TIME SERIES: 
THE STYLIZED FACTS 


Figures 3.1 through 3.6 illustrate the behavior of some of the more important variables 
encountered in macroeconomic analysis. Casual inspection does have its perils, and for- 
mal testing is necessary to substantiate any first impressions. However, the strong visual 
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pattern is that these series are not stationary; the sample means do not appear to be con- 
stant, and/or there is the strong appearance of heteroskedasticity. We can characterize 
the key features of the various series with these stylized facts: 


Billions of 2005 dollars 


1. 


15000 
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Many of the series contain a clear trend. Real and potential U.S. GDP, con- 
sumption, and investment (see Figure 3.1) exhibit a decidedly upward trend. 
Note that real consumption is smoother than real GDP and that real invest- 
ment is more volatile than real GDP. 


The volatility of many series is not constant over time. Real investment 
grew smoothly throughout most of the 1960s but became highly variable 

in the 1970s and in 2007 with the onset of the financial crisis. Note that the 
volatility of real GDP (see Figure 3.2) appears to fall in 1984, shows a neg- 
ative spike in 2007, and then stabilizes. More dramatic are the daily changes 
in the log of the NYSE U.S. 100 stock price index. In Figure 3.3, you can see 
periods where the stock market seems tranquil alongside periods with large 
increases and decreases in the market. Such series are called conditionally 
heteroskedastic if the unconditional (or long-run) variance is constant, but 
there are periods in which the variance is relatively high. 


Shocks to a series can display a high degree of persistence. Neither of the 
interest rate series shown in Figure 3.4 has a clear upward or downward trend. 
Nevertheless, both show a high degree of persistence. Notice that the 3-month 
T-bill rate experienced two upward surges in the 1970s and remained at 
those high levels for several years. Similarly, after a sharp decrease in the late 
1980s, the rate never again displayed the levels attained in the early 1980s. 
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FIGURE 3.1 Real GDP, Consumption, and Investment 
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FIGURE 3.2 Annualized Growth Rate of Real GDP 
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FIGURE 3.3 Percentage Change in the NYSE U.S. 100 (January 4, 2000-July 16, 2012) 
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FIGURE 3.4 Short- and Long-Term Interest Rates 


4. Some series seem to meander. Both the euro and Swiss franc appear to 
have a slight upward trend whereas the British pound (see Figure 3.5) 
shows no particular propensity to increase or decrease. Nevertheless, in 
the short run, the values of all three exchange rates to go through sustained 
periods of appreciation and depreciation without a tendency to revert to a 
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FIGURE 3.5 Daily Exchange Rates (January 3, 2000-April 4, 2013) 
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long-run mean. This type of “random walk” or “drifting” behavior is typical 
of nonstationary series. 


Some series share comovements with other series. Individually, the 
3-month T-bill rate and the 5-year yield on U.S. government securities do not 
appear to be stationary. Even though the rates show no tendency for mean 
reversion, the two series never drift too far apart. Moreover, large shocks to 
the 3-month rate appear to be timed similarly with those to the 5-year rate. 
The presence of such comovements should not be too surprising since the 
forces driving short-term and long-term rates should be similar. On the other 
hand, it is not clear whether the various exchange rate series exhibit the same 
long-run trend. The movements in the three series are such that all seem to 
experience appreciations and depreciations simultaneously. However, it is not 
clear whether the differences among the trend rates of growth are statistically 
significant. 


Some of the series exhibit breaks. The financial crisis of 2007—2008 caused 
a number of time series to experience structural breaks. Notice how real GDP, 
consumption, and investment all show particularly sharp declines at the time 
of the crisis. Also note that the pound depreciated sharply, whereas the euro 
and Swiss franc declined less dramatically. As economic activity declined, so 
did the price of oil (see Figure 3.6) as evidenced by the extremely sharp drop 
in the spot price of Brent crude oil. 


Please be aware that “eyeballing” the data is not a substitute for formally testing for 
the presence of conditional heteroskedasticity or for nonstationary behavior. Although 
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FIGURE 3.6 Weekly Values of the Spot Price of Oil (May 15, 1987-November 1, 2013) 
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most of the variables shown in the figures are nonstationary and/or homoskedastic, the 
issue will not always be so obvious. Fortunately, it is possible to modify the tools devel- 
oped in the last chapter to help in the identification and estimation of such series. The 
remainder of this chapter considers the issue of conditional heteroskedasticity. Models 
and formal tests for the presence of trends (either deterministic and/or stochastic) are 
discussed in the next chapter. The order in which you read Chapters 3 and 4 is immate- 
rial; some instructors may wish to cover the material in Chapter 4 and then the material 
in Chapter 3. However, the issue of comovements in multivariate time series must wait 
until Chapters 5 and 6. Potential nonlinearities and structural breaks are considered in 
Chapter 7. 


2. ARCH AND GARCH PROCESSES 


In conventional econometric models, the variance of the disturbance term is assumed to 
be constant. However, Figures 3.2 and 3.3 demonstrate that many economic time series 
exhibit periods of unusually large volatility followed by periods of relative tranquility. 
In such circumstances, the assumption of a constant variance (homoskedasticity) is 
inappropriate. It is easy to imagine instances in which you might want to forecast the 
conditional variance of a series. As an asset holder, you would be interested in forecasts 
of the rate of return and its variance over the holding period. The unconditional variance 
(i.e., the long-run forecast of the variance) would be unimportant if you plan to buy the 
asset at t and sell at t+ 1. 

One approach to forecasting the variance is to explicitly introduce an independent 
variable that helps to predict the volatility. Consider the simplest case in which 


Yr = E14 1% 
where: y,,, is the variable of interest 
E,41 is a white-noise disturbance term with variance ø? 
x, is an independent variable that can be observed at period t 


If x, = x1 = X2 = = constant, the {y,} sequence is the familiar white-noise 
process with a constant variance. However, when the realizations of the {x,} sequence 
are not all equal, the variance of y,,, conditional on the observable value of x, is 


2,2 
var(y,41|X,) = x70 


Here the conditional variance of y,,., is dependent on the realized value of x,. Since you 
can observe x, at time period ¢, you can form the variance of y,,, conditionally on the 
realized value of x,. If the magnitude (x,)? is large (small), the variance of y,,, will be 
large (small) as well. Furthermore, if the successive values of { x, } exhibit positive serial 
correlation (so that a large value of x, tends to be followed by a large value of x,,;), the 
conditional variance of the {,} sequence will exhibit positive serial correlation as well. 
In this way, the introduction of the {x,} sequence can explain periods of volatility in the 
{y,} sequence. In practice, you might want to modify the basic model by introducing 
the coefficients a and a, and estimating the regression equation in logarithmic form as 


In(y,) = ao + a In) + e, 


where e, is the error term [formally, e, = In(é,)]. 
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This procedure is simple to implement since the logarithmic transformation results 
in a linear regression equation; OLS can be used to estimate ag and a, directly. A major 
difficulty with this strategy is that it assumes a specific cause for the changing variance. 
Moreover, the methodology also forces {x,} to affect the mean of In(y,). Oftentimes, 
you might not have a firm theoretical reason for selecting one candidate for the {x,} 
sequence over other reasonable choices. Was it the oil price shocks, a change in the 
conduct of monetary policy, and/or the breakdown of the Bretton Woods system that 
was responsible for the volatility of real investment during the 1970s? Moreover, the 
technique necessitates a transformation of the data such that the resulting series has 
a constant variance. In the example at hand, the {€,} sequence is assumed to have a 
constant variance. If this assumption is violated, some other transformation of the data 
is necessary. 


ARCH Processes 


Instead of using ad hoc variable choices for x, and/or data transformations, Engle 
(1982) shows that it is possible to simultaneously model the mean and the variance 
of a series. As a preliminary step to understanding Engle’s methodology, note that con- 
ditional forecasts are vastly superior to unconditional forecasts. To elaborate, suppose 
you estimate the stationary ARMA model y, = dy + a) y,_1 + £, and want to forecast 
Y41- Lhe conditional mean of y,,, is 


E Y1 = 4o + 41y; 


If we use this conditional mean to forecast y,,;, the forecast error variance is 
E [Oni — 4 — 41y )?] = EE = o°. However, if unconditional forecasts are used, 
the unconditional forecast is always the long-run mean of the {y,} sequence equal to 
dy/(1 — a,). The unconditional forecast error variance is 


2 2 3 2 

E{ [yn — 40/1 — ay)" } = ELE pgs + aye; + Em1 + Go t] 
2 2 
=0 / ad = ay) 

Since 1/(1 — a‘) > 1, the unconditional forecast has a greater variance than the 
conditional forecast. Thus, conditional forecasts (since they take into account the 
known current and past realizations of series) are clearly preferable. 

Similarly, if the variance of {€,} is not constant, you can estimate any tendency 
for sustained movements in the variance using an ARMA model. For example, let {€;} 


denote the estimated residuals from the model y, = ag + a, y,_, + £, so that the condi- 
tional variance of y,, is 


var(Ya lY) = ELl — ao — ayy,)'| 
= E (Em1) 


To this point, we have set E,(€,,,)” equal to the constant o”. Now suppose that the 
conditional variance is not constant. One simple strategy is to model the conditional 
variance as an AR(q) process using squares of the estimated residuals 


ê? = ao + ayer, + aye? HH aê ty, (3.1) 


where v, is a white-noise process. 
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If the values of a), a, ..., a, all equal to zero, the estimated variance is simply the 
constant œo. Otherwise, the conditional variance of y, evolves according to the autore- 
gressive process given by (3.1). As such, you can use (3.1) to forecast the conditional 
variance at t+ 1 as 


fossa, 


ad a a2 a2 
E,ê | = Ag + aê; +Ê | qÊr+1-q 


t+1 

For this reason, an equation like (3.1) is called an autoregressive conditional 
heteroskedastic (ARCH) model. There are many possible applications for ARCH 
models since the residuals in (3.1) can come from an autoregression, an ARMA model, 
or a standard regression model. 

In actuality, the linear specification of (3.1) is not the most convenient. The reason 
is that the model for {y,} and the conditional variance are best estimated simultaneously 
using maximum likelihood techniques. Moreover, instead of the specification given by 
(3.1), it is more tractable to specify v, as a multiplicative disturbance. 

The simplest example from the class of multiplicative conditionally heteroskedas- 
tic models proposed by Engle (1982) is 


E, = Vy ao + ayer, (3.2) 


where v, = white-noise process such that o? = 1, v, and €,_, are independent of each 
other, and a and a, are constants such that a) > 0 and 0 <a, < 1. 

Consider the properties of the proposed {€,} sequence. Since v, is white noise and 
is independent of €,_,, it is easy to show that the elements of the {¢,} sequence have a 
mean of zero and are uncorrelated. The proof is straightforward. Take the unconditional 
expectation of €,. Since Ev, = 0, it follows that 


Ee, = Elv,(ay + aje?) 
= Ev,E(a + aje) =0 (3.3) 
Since Ev,v,_; = 0, it also follows that 
Ee€,;=9 if0 (3.4) 


The derivation of the unconditional variance of €, is also straightforward. Square 
g, and take the unconditional expectation to form 


Ee? = Elv? (a + aje? )] 
= Ev; E(u + a €7_,) 


Since ø? = 1 and the unconditional variance of £, is identical to that of €,_; 
(i.e., Ee? = Ee? 1b the unconditional variance is 


Ee? =a)/(1 — a) (3.5) 


Thus, the unconditional mean and variance are unaffected by the presence of the 
error process given by (3.2). Similarly, it is easy to show that the conditional mean of £, 
is equal to zero. Given that v, and €,_, are independent and that Ev, = 0, the conditional 
mean of £, is 


E(E E15 €;-25 ++) = Ey 1v E1 (ap + aE) =0 
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At this point, you might be thinking that the properties of the {e,} sequence are 
not affected by (3.2) since the mean is zero, the variance is constant, and all autocovari- 
ances are zero. However, the influence of (3.2) falls entirely on the conditional variance. 
Because Ev? = 1, the variance of £, conditioned on the past history of €,_,€;_9, ... is 


Ble? | 6,489 --- ] = ao FOE” (3.6) 


t-1 
In (3.6), the conditional variance of £, is dependent on the realized value of E i 
If the realized value of e is large, the conditional variance in ¢ will be large as well. 
In (3.6), the conditional variance is a first-order AutoRegressive Conditionally Het- 
eroskedastic process denoted by ARCH(1). As opposed to a usual autoregression, the 
coefficients a and œ, have to be restricted. In order to ensure that the conditional vari- 
ance is never negative, it is necessary to assume that both ag and a, are positive. After 
all, if ag is negative, a sufficiently small realization of €,_, will mean that (3.6) is neg- 
ative. Similarly, if a, is negative, a sufficiently large realization of €,_, can render a 
negative value for the conditional variance. Moreover, to ensure the stability of the 
process, it is necessary to restrict a, such that O < a, < 1. 

Equations (3.3)—(3.6) illustrate the essential features of any ARCH process. In 
an ARCH model, the conditional and unconditional expectations of the error terms 
are equal to zero. Moreover, the {€,} sequence is serially uncorrelated because, for all 
s #0, Ee,€,_, = 0. The key point is that the errors are not independent since they are 
related through their second moment (recall that correlation is a linear relationship). 
The conditional variance itself is an autoregressive process resulting in conditionally 
heteroskedastic errors. When the realized value of €,_, is far from zero—-so that a, e , 
is relatively large—the variance of €, will tend to be large. As you will see momentarily, 
the conditional heteroskedasticity in {€,} will result in {y,} being heteroskedastic itself. 
Thus, the ARCH model is able to capture periods of tranquility and volatility in the {y,} 
series. 

The four panels of Figure 3.7 depict two different ARCH models. Panel (a), rep- 
resenting the {v,} sequence, shows 100 serially uncorrelated and normally distributed 
random deviates. From casual inspection, the {v,} sequence appears to fluctuate around 
a mean of zero and have a constant variance. Note the moderate increase in volatility 
between periods 50 and 60. Given the initial condition €, = 0, these realizations of the 
{v,} sequence were used to construct the next 100 values of the {€,} sequence using 
equation (3.2) and setting a) = 1 and a, = 0.8. As illustrated in Panel (b), the {e,} 
sequence also has a mean of zero, but the variance appears to experience an increase 
in volatility around t = 50. 

How does the error structure affect the {y,} sequence? Clearly, if the autoregressive 
parameter a is zero, y, is nothing more than €,. Thus, Panel (b) can be used to depict the 
time path of the {y,} sequence for the case of a; = 0. Panels (c) and (d) show the behav- 
ior of the {y,} sequence for the cases of a, = 0.2 and 0.9, respectively. The essential 
point to note is that the ARCH error structure and the autocorrelation parameters of the 
{y,} process interact with each other. Comparing Panels (c) and (d) illustrates that the 
volatility of {y,} is increasing in a, and a,. The explanation is intuitive. Any unusu- 
ally large (in absolute value) shock in v, will be associated with a persistently large 
variance in the {€,} sequence; the larger is œ}, the longer the persistence. Moreover, 
the greater the autoregressive parameter a,, the more persistent is any given change 
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White-noise process v; Et=Vt/1 + 0.82, 
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FIGURE 3.7 Simulated ARCH Processes 


in y,. The stronger the tendency for {y,} to remain away from its mean, the greater the 
variance. 
To formally examine the properties of the {y,} sequence, the conditional mean and 
variance are given by 
E,1Y; = ao + 4 Y-1 


and 


varO, lY Y2 +++) = E10, — 4 — aY) 
= E,_(€,)° 


2 
= ao + a (E1) 
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Since œ; and e cannot be negative, the minimum value for the conditional vari- 
ance is @. For any nonzero realization of €,_,, the conditional variance of y, is pos- 
itively related to a,. The unconditional mean and variance of y, can be obtained by 
solving the difference equation for y, and then taking expectations. If the process began 
sufficiently far in the past (so that the arbitrary constant Ag can safely be ignored), the 
solution for y, is 


y= 


+ Sa Eii (3.7) 


a i=0 


Since Ee, = 0 for all t, the unconditional expectation of (3.7) is Ey, = ag/(1 — a). 
The unconditional variance can be obtained in a similar fashion using (3.7). Given 
that Ee,€,_; is zero for all i 4 0, the unconditional variance of y, follows directly from 
(3.7) as 


co 
j 
var(y,) = ay'var(é,_;) 
i=0 
From the result that the unconditional variance of e, is constant [i.e., var(e,) = 
var(€,_1) = var(E,_») = +++ = A/C — a,)], it follows that 


var(y,) = (, “ (, 1) 
fi 


Clearly, the variance of the {y,} sequence is increasing in a, and in the absolute 
value of a,. The point clearly generalizes to higher order autoregressive processes. 

The ARCH process given by (3.2) has been extended in several interesting 
ways. Engle’s (1982) original contribution considered the entire class of higher order 
ARCH(q) processes: 


(3.8) 


ay = Vy 


q 
2 
a + QE i 
i=1 


In (3.8), all shocks from €,_; to €,_, have a direct effect on €,, so that the condi- 
tional variance acts like an BulOreer este process of order q. It is a good exercise to 
demonstrate that the forecast for E,€? ‘+1 arising from (3.1) is precisely the same as that 
from (3.8). 


The GARCH Model 


Bollerslev (1986) extended Engle’s original work by developing a technique that allows 
the conditional variance to be an ARMA process. Now let the error process be such that 


&= v,Vh, 


where o2 = 1 and 


q P 
h, = ao + È, ae? + È, Bihi (3.9) 
i=l l 
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Since {v,} is a white-noise process, the conditional and unconditional means of £, 
are equal to zero. Taking the expected value of €,, it is easy to verify that 


Ee, = Ev (h)! =0 


The important point is that the conditional variance of £, is given by E,_1€7 = h,. 
Thus, the conditional variance of €, is the ARMA process given by the expression h, 
in (3.9). 

This generalized ARCH(p, q) model—called GARCH(p, g)—allows for both 
autoregressive and moving average components in the heteroskedastic variance. If we 
set p = O and q = 1, it is clear that the first-order ARCH model given by (3.2) is simply 
a GARCH(0, 1) model. Similarly, if all values of J; equal zero, the GARCH(p, q) 
model is equivalent to an ARCH(q) model. The benefits of the GARCH model 
should be clear; a high-order ARCH model may have a more parsimonious GARCH 
representation that is much easier to identify and estimate. This is particularly true 
since all coefficients in (3.9) must be positive. Clearly, the more parsimonious model 
will entail fewer coefficient restrictions. Moreover, to ensure that the variance is finite, 
all characteristic roots of (3.9) must lie and imply that the process is stable.! 

The key feature of GARCH models is that the conditional variance of the distur- 
bances of the {y,} sequence acts like an ARMA process. Hence, it is to be expected 
that the residuals from a fitted ARMA model should display this characteristic pattern. 
To explain, suppose you estimate {y,} as an ARMA process. If your model of {y,} is 
adequate, the ACF and PACF of the residuals should be indicative of a white-noise 
process. However, the ACF of the squared residuals can help identify the order of the 
GARCH process. Equation (3.9) looks very much like a standard ARMA (p, q) process. 
As such, if there is conditional heteroskedasticity, the correlogram should be suggestive 
of such a process. The technique to construct the correlogram of the squared residuals 
is as follows: 


STEP 1: Estimate the {y,} sequence using the “best-fitting” ARMA model (or regres- 
sion model) and obtain the squares of the fitted errors { é? }. Also, calculate 
the sample variance of the residuals (67) defined as 


T 
ayer 
t=1 
where T = number of residuals. 
STEP 2: Calculate and plot the sample autocorrelations of the squared residuals as 


T 
DG - VEL, - 6) 
t=i+1 
Pi T 
E-r 
t=1 
STEP 3: Recall from Chapter 2, in large samples the standard deviation of p; can 
be approximated by 1/ VT ” Individual values of p; that are significantly 
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different from zero are indicative of GARCH errors. Ljung—Box Q-statistics 
can be used to test for groups of significant coefficients. As in Chapter 2, the 
statistic 


Q=T(T+2)) oe /(T-i) 
i=l 


has an asymptotic y? distribution with n degrees of freedom if the {é?} 
sequence is serially uncorrelated. Rejecting the null hypothesis that the {é?} 
sequence is serially uncorrelated is equivalent to rejecting the null hypothe- 
sis of no ARCH or GARCH errors. In practice, you should consider values 
of n up to T/A. 


A more formal Lagrange multiplier test for ARCH errors is the test by McLeod 
and Li (1983). The methodology involves the following two steps:* 


STEP 1: Use OLS to estimate the most appropriate regression equation or ARMA 
model and let { é7} denote the squares of the fitted errors. 

STEP 2: Regress these squared residuals on a constant and on the q lagged values 
a2 a2 a2 a2 


E 2 Epa Erai Erag that is, estimate a regression of the form 


2 


ad. a2 A eae a2 
Ep = Ay + AEL Tae 4+ + aE 


If there are no ARCH or GARCH effects, the estimated values of a, through a, 
should be zero. Hence, this regression will have little explanatory power so that the 
coefficient of determination (i.e., the usual R?) will be quite low. Using a sample of 
T residuals, under the null hypothesis of no ARCH errors, the test statistic TR? con- 
verges to a y? distribution with q degrees of freedom. If TR? is sufficiently large, 
rejection of the null hypothesis that a, through a, are jointly equal to zero is equiv- 
alent to rejection of the null hypothesis of no ARCH errors. On the other hand, if TR? 
is sufficiently low, it is possible to conclude that there are no ARCH effects. In the 
small sample sizes typically used in applied work, an F-test for the null hypothesis 
a, =--+ =a, = 0 has been shown to be superior to a x’ test. Compare the sample 
value of F to the values in an F-table with g degrees of freedom in the numerator and 
T — q degrees of freedom in the denominator. 


3. ARCH AND GARCH ESTIMATES 
OF INFLATION 


ARCH and GARCH models have become very popular in that they enable the econo- 
metrician to estimate the variance of a series at a particular point in time. Clearly, asset 
pricing models indicate that the risk premium will depend on the expected return and 
the variance of that return. The relevant measure is the risk over the holding period, 
not the unconditional risk. Similarly, a portfolio manager who uses value-at-risk (see 
the Supplementary Manual) might be unwilling to hold a portfolio with a 5% chance 
of losing $1 million. The assessment of the risk should be determined using the condi- 
tional distribution of asset returns. To use Engle’s example of the importance of using 
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the conditional variance rather than the unconditional variance, consider the nature of 
the wage-bargaining process. Clearly, firms and unions need to forecast the inflation 
rate over the duration of the labor contract. Economic theory suggests that the terms of 
the wage contract will depend on the inflation forecasts and the uncertainty concern- 
ing the accuracy of these forecasts. Let E,z,,, denote the conditional expected rate of 
inflation for f+ 1 and let o2, denote the conditional variance. If parties to the contract 
have rational expectations, the terms of the contract will depend on E,z,,, and o2, as 
opposed to the unconditional mean or the unconditional variance. 

This example illustrates a very important point. The rational expectations hypoth- 
esis asserts that agents do not waste useful information. In forecasting any time series, 
rational agents use the conditional distribution, rather than the unconditional distribu- 
tion, of the series. Hence, any test of the wage bargaining model above that uses the 
historical variance of the inflation rate would be inconsistent with the notion that ratio- 
nal agents make use of all available information (i.e., conditional means and variances). 
Engle’s 2003 Nobel prize (shared with Clive Granger) is a testament to the importance 
of ARCH models. Theoretical models using variance as a measure of risk (such as 
mean-variance analysis) can be tested using the conditional variance. As such, the 
growth in the use of ARCH/GARCH methods has been nothing short of impressive. 
In fact, there are so many types of models of conditional volatility that it is common 
practice to refer to the entire class of models as ARCH or GARCH models. 


Engle’s Model of U.K. Inflation 


Although Section 2 focused on the residuals of a pure ARMA model, it is possible to 
estimate the residuals of a standard multiple regression model as ARCH or GARCH 
processes. In fact, Engle’s (1982) seminal paper considered the residuals of a simple 
model of the wage/price spiral for the U.K. over the 195802-197702 period. Let p, 
denote the log of the U.K. consumer price index and w, denote the log of the index 
of nominal wage rates. Thus, the rate of inflation is z, = p, — p;_;, and the real wage 
is r, = w, — p,- Engle reports that, after some experimentation, he chose the following 
model of the U.K. inflation rate z, (standard errors are in parentheses): 


x, = 0.0257 + 0.3342,_, + 0.4082,_, — 0.4042,_, + 0.0559r,_, + £, 
(0.006) (0.103) (0.110) (0.114) (0.014) (3.10) 


where var(é,) is estimated to be the constant 8.9 x 10°. 

The nature of the model is such that increases in the previous period’s real wage 
increase the current inflation rate. Lagged inflation rates at t — 4 and t — 5 are intended 
to capture seasonal factors. All coefficients have a t-statistic greater than 3.0 and a bat- 
tery of diagnostic tests did not indicate the presence of serial correlation. The estimated 
variance was the constant value 8.9 x 10°. In testing for ARCH errors, the Lagrange 
multiplier test for ARCH(1) errors was not significant but the test for an ARCH(A4) error 
process yielded a value of TR* equal to 15.2. At the 0.01 significance level, the critical 
value of xX with 4 degrees of freedom is 13.28; hence, Engle concludes that there are 
ARCH errors. 
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Engle specified an ARCH(4) process forcing the following declining set of weights 
on the errors: 


h, = ay + a, (0.4E? | +0.32? , + 0.2€? , + 0.1e? 4) (3.11) 


The rationale for choosing a two-parameter variance function was to ensure the 
nonnegativity and stationarity constraints that might not be satisfied using an unre- 
stricted estimating equation. Given this particular set of weights, the necessary and 
sufficient conditions for the two constraints to be satisfied are ay > O and 0 < a, < 1. 

Engle shows that the estimation of the parameters of (3.10) and (3.11) can be con- 
sidered separately without loss of asymptotic efficiency. One procedure is to estimate 
(3.10) using OLS and to save the residuals. From these residuals, an estimate of the 
parameters of (3.11) can be constructed, and based on these estimates, new estimates 
of (3.10) can be obtained. To estimate both with full efficiency, continued iterations 
can be checked to determine whether the separate estimates are converging. Now that 
many statistical software packages contain nonlinear maximum likelihood estimation 
routines, the current procedure is to simultaneously estimate both equations using the 
methodology discussed in Section 8. 

Engle’s maximum likelihood estimates of the model are 


m, = 0.0328 + 0.1622,_; + 0.2642,_4 — 0.325m,5 + 0.07077, +€, (3.12) 
(0.005) (0.108) (0.089) (0.099) (0.012) 


h, = 1.4X 1075 + 0.955(0.4e2_, + 0.32, + 0.2€7_, + 0.1e?_,) 
(8.5.x 106) (0.298) 


The estimated values of h, are the conditional forecast error variances. All coef- 
ficients (except the first lag of the inflation rate) are significant at conventional levels. 
For a given real wage, the point estimates of (3.12) imply that the inflation rate is a 
convergent process. Using the calculated values of the {h,} sequence, Engle finds that 
the standard deviation of inflation forecasts more than doubled as the economy moved 
from the “predictable sixties into the chaotic seventies.” The point estimate of 0.955 
indicates an extreme amount of volatility persistence. 


Bollerslev’s Estimates of U.S. Inflation 


Bollerslev’s (1986) estimate of U.S. inflation provides an interesting comparison of 
a standard autoregressive time-series model (which assumes a constant variance), a 
model with ARCH errors, and a model with GARCH errors. He notes that the ARCH 
procedure has been useful in modeling different economic phenomena but points out 
(see pp. 307-308): 


Common to most ... applications, however, is the introduction of a rather 
arbitrary linear declining lag structure in the conditional variance equation 
to take account of the long memory typically found in empirical work, 
since estimating a totally free lag distribution often will lead to violation 
of the nonnegativity constraints. 
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There is no doubt that the lag structure Engle used to model h, in (3.12) is subject to 
this criticism. Using quarterly data over the 194802-198304 period, Bollerslev (1986) 
calculated the inflation rate (z,) as the logarithmic change in the U.S. GNP deflator. He 
then estimated the autoregression (the standard errors are in parentheses): 


x, = 0.240 + 0.552m,, + 0.1777, + 0.232a,_3 — 0.2097,4 +£, (3.13) 
(0.080) (0.083) (0.089) (0.090) (0.080) 


where var(é,) is estimated to be the constant value 0.282. 

Equation (3.13) seems to have all the properties of a well-estimated time-series 
model. All coefficients are significant at conventional levels, and the estimated values 
of the autoregressive coefficients imply stationarity. Bollerslev reports that the ACF and 
PACF do not exhibit any significant correlations at the 5% significance level. However, 
as is typical of ARCH errors, the ACF and PACF of the squared residuals (i.e., £?) 
show significant correlations. The Lagrange multiplier tests for ARCH(1), ARCH(4), 
and ARCH(8) errors are all highly significant. 

Bollerslev next estimates the restricted ARCH(8) model originally proposed by 
Engle and Kraft (1983). By way of comparison to (3.13), he finds 


x, = 0.138 + 0.423z,_, + 0.2227, + 0.3777, 3 — 0.17574 +6, (3.14) 
(0.059) (0.081) (0.108) (0.078) (0.104) 
8 


h, = 0.058 + 0.802 )\[(9 — i)/36]e? , 


l 


i=1 
(0.003) (0.265) 


Note that the autoregressive coefficients of (3.13) and (3.14) are similar. The mod- 
els of the variance, however, are quite different. Equation (3.13) assumes a constant 
variance, whereas (3.14) assumes that the variance (h,) is a geometrically declining 
weighted average of the variance in the previous eight quarters. Hence, the inflation rate 
predictions of the two models should be similar, but the confidence intervals surround- 
ing the forecasts will differ. Equation (3.13) yields a constant interval of unchanging 
width. Equation (3.14) yields a confidence interval that expands during periods of infla- 
tion volatility and contracts in relatively tranquil periods. Note that constraining the 
coefficients of h, to follow a decaying pattern conserves degrees of freedom and consid- 
erably eases the estimation process. Moreover, the lagged coefficients given by (9 — i) 
are necessarily positive. 

Of course, the declining weight structure of 8/36, 7/36, 6/36, ... in (3.14) is 
completely arbitrary. Bollerslev goes on to estimate the following parsimonious 
GARCH(1, 1) model: 


m, = 0.141 + 0.4337,_, + 0.2297, + 0.3497,3 — 0.1627,4 + €, (3.15) 
(0.060) (0.081) (0.110) (0.077) (0.104) 

h, = 0.007 + 0.135€7 | + 0.829h,_, 
(0.006) (0.070) (0.068) 

Diagnostic checks indicate that the ACF and PACF of the squared residuals do not 


reveal any coefficients exceeding 27~°>. LM tests for the presence of additional lags 
of e? are not significant at the 5% level. 
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4. THREE EXAMPLES OF GARCH MODELS 


GARCH models have found their greatest use in modeling financial data. However, 
this section and Section 5 are intended to illustrate some other uses of GARCH mod- 
els. The first example is a straightforward estimation of the variance of the price of oil. 
It is shown that uncertainty in the petroleum market is very persistent. In the second 
example, the issue is whether there has been a significant reduction in the volatility 
of real GDP. In the third example, the intent is to obtain reasonable conditional confi- 
dence intervals when forecasting. The example also shows that inference in an ARMA 
(or regression) framework can be improved by accounting for GARCH effects. The 
example in Section 5 uses a GARCH framework to measure the attitudes and behavior 
toward risk in the U.S. broiler market. 


A GARCH Model of Oil Prices 


To get a better idea of the actual process of fitting a GARCH model, it is instructive 
to work with the price of oil shown in Figure 3.6. The file OIL.XLS contains the 1382 
weekly values of the spot price of a barrel of Brent crude over the period May 15, 1987, 
to November 1, 2013. Use the data set to create the logarithmic change in the price of 
oil as p, = 100.0*[log(spot,) — log(spot,_,)]. If you experiment with several ARMA 
models, you should find that the following MA model works well: 


p, = 0.127 + £, + 0.177eE,_, + 0.095€,_3 


(0.90) (6.72) (3.60) 
The ACF of the residuals is 
Py P2 P3 P4 P5 P6 P7 Ps 


0.002 0.013 -0.002 0.009  —0.013 —-0.008 0.010 0.005 


Although the residuals are not serially correlated, the ACF of the squared residuals 


Pi P2 P3 P4 Ps P6 Py Ps 
0.18 O17 O14 O16 O12 O15 0.18 0.15 


If you conduct the McLeod—Li (1983) test for ARCH errors using four lags, you 
should obtain 


ê? = 9.68 + 0.134? | + 0.1127, + 0.0827 , + 0.11é? , 


The sample value of the F-statistic for the null hypothesis that the coefficients a, 
through a, all equal zero is 26.42. With 4 numerator and 1372 denominator degrees of 
freedom, we reject the null hypothesis of no ARCH errors at any conventional signifi- 
cance level. If you are worried about the break shown in Figure 3.6 and include a break 
dummy at July 11, 2008 (see Question 12 at the end of the chapter), you should find 
that it is not significant. 

It is generally best to begin with a very simple specification for the variance such 
as an ARCH(1) or a GARCH(1, 1) model. If you begin by estimating a GARCH(1, 1) 
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model for the conditional variance, you should find that the MA(3) term in the model 
of the mean is not significant. If you reestimate the model without the €,_3 term in the 
mean equation, you should obtain 


P; = 0.130 + £, + 0.225e, 
h, = 0.402 + 0.097e7_, + 0.881h,; 


In order to check for model adequacy, it is possible to form the standardized residu- 
als and the squared standardized residuals as ê,/ ie and é /h,, respectively. In essence, 
you standardize each estimated residual (€,) by its own conditional standard devia- 
tion (A? ) and each squared residual by its own conditional variance. Another way to 
view the standardized residuals is to reconsider equation (3.9). The estimated value of 
é,/ yh is an estimate of v,. The estimated v, series needs to be serially uncorrelated 
with a constant variance approximately equal to unity. 

The autocorrelations of the standardized residuals and standardized squared resid- 
uals are 


Correlations Pi Pa P3 Pa Ps Ps Py Ps 
Eha 0.05 —0.01 0.01 0.01 —0.04 —0.01 —0.00 —0.01 
e? /h, 0.00 0.00 —0.00 —0.01 —0.02 —0.01 —0.01 —0.00 


The Q-statistics for correlations in the ê, je? series are Q(4) = 3.73 and Q(8) = 
6.16. Both of these values are not significant at conventional levels so that we can 
accept the null hypothesis of no remaining serial correlation. Similarly, the Q(4)- and 
Q(8)-statistics for serial correlation in the A /h, series are 0.00 and 1.36, respectively. 
Since these are not significant, we can accept the null hypothesis of no remaining 
GARCH effects. Instead of using the Q-statistics, you could check for remaining serial 
autocorrelation using a model of the form: 


a Oi) a 0.5 a 0.5 
é,/h; = A + @€,_,/h Prent OnE ral hn 


In small samples, it is common to use an F-test (instead of a x test) to determine 
whether the squared autocorrelations are significant. If you use four lags, you should 
find that the F-statistic for the null hypothesis a, = a) = a3 = a4 = 0 is 0.951 with a 
prob-value of 0.43. Again, you can conclude that there is not any significant correlation 
in the standardized residuals. 

Given that coefficients of the GARCH model sum to nearly one (0.097 + 0.881 = 
0.971), the conditional volatility is highly persistent. As such, we should anticipate 
that any shock creating uncertainty in the oil market should show little tendency to 
dissipate. 


Volatility Moderation 


There is a large body of literature indicating that the volatility of important macroeco- 
nomic variables in the industrialized economies decreased in early 1984. For example, 
Stock and Watson (2002) reported that the standard deviation of real U.S. GDP growth 
during the 1984—2002 period was 61% smaller than that during the 1960-1983 period. 
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As discussed in Romer (1999), some have argued that better monetary policies enabled 
central bankers to better stabilize economic activity. Others have argued that it is a mat- 
ter of luck that there had not been any major negative supply shocks (such as oil price 
shocks or widespread failures) since the 1970s. Although this so-called “Great Mod- 
eration” came to an end with the financial crisis of 2008, we can use the GARCH 
framework to test whether or not there was a volatility break in 1984Q1. 

The file RGDP.XLS contains the four series that were used to construct Figures 3.1 
and 3.2. You can use the data in the file to construct the growth rate of real U.S. 
GDP as y, = log(RGDP,/RGDP,_,). Without going into detail, if you worked through 
Chapter 2, it should be clear that a reasonable model for the growth rate of real 
GDP is 

y, = 0.005 + 0.371y,1 + €; 
(6.80) (6.44) 


Although the ACF of the residuals is such that p, = 0.12, the Ljung—Box Q(4) and 
Q(8) statistics of 5.48 and 9.98, respectively, are not statistically significant. The issue 
is to measure the extent of the volatility break in 1984Q1. As a preliminary test, we can 
try to determine if there is any conditional volatility. Since we are using quarterly data, 
it makes sense to use the McLeod—Li (1983) test with a four-quarter lag. Consider 


ê? = 5.56x10~° + 0.1162? | + 0.127é? , — 0.029€? , + 0.1232? , 


The sample value of the F-statistic for the null hypothesis that the coefficients a, 
through a, all equal zero is 3.48. With 4 numerator and 253 denominator degrees of 
freedom, this is significant at the 0.009 level. Hence, there is strong evidence that the 
{y,} series exhibits conditional volatility. 

Now create the dummy variable D, that is equal to 1 beginning in 1984Q1 and is 
equal to 0 prior to 1984Q1. If you estimate the y, series allowing for ARCH(1) errors 
and include D, in the variance equation, you should find 


y, = 0.004 + 0.398y,, + £, 
(7.50) (6.76) 


h, = 1.10 x 1074 + 0.18222 , — 8.76 x 10D, 
(7.87) (2.89) (—6.14) 


Although it is statistically significant, the magnitude of the ARCH(1) term is such 
that there is only a small amount of volatility persistence. Given that the coefficient on 
D, is statistically different from zero, we can conclude that there is a volatility break 
in 1984. Notice that the intercept of the variance equation was 1.10 x 1074 prior to 
198401 and experienced a significant decline to 2.22 x 1075(= 1.10 x 1074 — 8.76 x 
1075) beginning in 1984Q1. The estimated decline is even greater than the 61% figure 
indicated by Stock and Watson (2002). Question 8 asks you to experiment with alter- 
native models including one for the effects of the financial crisis. 


A GARCH Model of the Spread 


To take a more difficult example of fitting aGARCH model, reconsider the estimates of 
the interest rate spread used in the last chapter. Recall that the Box—Jenkins approach 
led us to give serious consideration to the ARMA[2,(1,7)] model. If you estimate the 
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model for the entire 196003-201204 period, you should obtain 


s, = 1.215 + 0.373s,, + 0.3725, + €, + 0.762e,, —0.14le,, (3.16) 
(6.00) (3.33) (3.50) (9.60) (-3.23) 


As shown in Chapter 2, the estimated model performs quite well. All estimated 
parameters are significant at conventional levels, and both the AIC and SBC selected 
this specification. The Ljung—Box Q-statistics for serial correlation using lags of 4, 
8, and 12 quarters are not significant at conventional levels. Moreover, there was no 
evidence of structural change in the estimated coefficients. Nevertheless, during the 
very late 1970s and early 1980s, there was a period of unusual volatility that could be 
indicative of a GARCH process. The aim of this section is to illustrate a step-by-step 
analysis of a GARCH estimation of the spread. You should be able to follow along 
using the data in the file labeled QUARTERLY.XLS. 


Formal Tests for ARCH Errors 


Although (3.16) appears to be quite reasonable, the volatility during the 1970s 
suggests that it is prudent to examine the ACF and PACF of the squared residuals. 
The autocorrelations of the squared residuals are such that p; = 0.043, p, = 0.179, 
p3 = 0.178, p4 = 0.319, and p} = 0.373. Other values for p; are generally 0.14 or less. 
The Ljung—Box Q-statistics for the squared residuals are all highly significant; for 
example, Q(4) = 35.98 and Q(8) = 71.75, which are both highly significant at any 
conventional level. 

Next, let ê, denote the residuals of (3.16) and consider the McLeod—Li (1983) test 
using a lag length of seven quarters: 


é? = 0.08 — 0.02é? , +0. 14é° , + 0.09€? , + 0.2687 , — 0.02é? . — 0.092? + 0.30ê? 
(1.82) (—0.29) (2.07) (1.30) (3.85) (—0.35) (—1.34) (4.34) 

(3.17) 

The value of TR? = 46.17 so that there is strong evidence of ARCH errors; with 

7 degrees of freedom, the 5% critical value of xX is 14.1, and the 1% critical value 

is 18.5. In practice, it is typical to use an F-test to determine whether it is possible to 


reject the restriction a, = @ = +- =a, = 0. In (3.17) with q = 7, the sample value of 
F is 8.20; with 7 degrees of freedom in the numerator and 195 in the denominator, this 
is highly significant. 


At this point, you might be tempted to plot the ACF and PACF of the squared 
residuals and estimate the squared residuals using Box—Jenkins methods. In this way, 
a parsimonious model of the error process could be obtained. Also, you might be con- 
cerned that some of the coefficients in (3.17) are negative and try to reestimate the 
equation using some other value for g. However, a word of caution is in order. The 
problem with this strategy is that (3.16) was estimated under the assumption that the 
conditional variance was constant. Moreover, equations such as (3.17) can tell you 
whether or not there are GARCH errors but not the precise order of p and/or q. 


Alternative Estimates of the Model 


The appropriate way to obtain the proper order of the GARCH process is to estimate 
the model of the spread and the model of the conditional variance simultaneously. As 
such, GARCH processes are typically estimated by maximum likelihood techniques 


138 CHAPTER3 MODELING VOLATILITY 


so as to obtain estimates that are fully efficient. A low-order ARCH(q) process seems 
like a reasonable starting place for a model of the conditional variance. Even though the 
coefficient of ê in (3.17) is significant, it is not a good idea to begin a highly parame- 
terized ARCH(7) model. As noted by Bollerslev (1986), the GARCH(1, 1) specification 
can mimic the properties of a high-order ARCH process. Consider 


s, = 0.192 + 0.5145, , + 0.304s,, + €, + 0.686e,_; — 0.130e,_ 


(2.86) (4.02) (2.55) (8.08) (—2.65) 
h, = 0.017 + 0.233e2_, + 0.697h,_; 
(1.93) (3.56) (11.42) 


The model seems to be quite plausible. All of the slope coefficients are sensible and 
are highly significant. Although the intercept in the A, equation is not significant at the 
5% level, you do not want to eliminate this term— without an intercept, the conditional 
volatility could be zero. The autoregressive coefficients in the model of the mean imply 
convergence. The coefficients in the h, equation are both positive and the sum a, + 
pı is less than unity. Now, form the standardized errors as the residuals divided by 
their conditional standard deviations, that is form the series €,/(h,)°>. The estimated 
standardized residuals are an estimate of the v, series. If you check for serial correlation 
in the standardized residuals, you will find that the autocorrelations are such that 


Pi P2 P3 P4 P5 P6 Py Ps 
0.04 0.01 0.02 0.07  -0.06 —-0.14 -0.01 0.02 


The Q-statistics are Q(4) = 1.47 and Q(8) = 7.07 so that we can be confident that 
there is no remaining serial correlation in the standardized residuals. Now, the issue is 
whether or not the GARCH(1, 1) specification is sufficient to capture all of the dynam- 
ics in the conditional variance. To answer this question, form the autocorrelations of 
the squared standardized residuals: 


Py P2 P3 P4 Ps P6 Py Ps 
—0.13 0.16 0.00 0.05 0.00  -0.06 0.14  -0.03 


Although the values of p; and p, are reasonably large, we can formally test for 
remaining GARCH errors using the McLeod-—Li (1983) test. If you use two lags of the 
standardized squared residuals, you should obtain 


é?/h, = 0.95 — 0.112_,/h,-1 + 0.1422_,/h,-1 
(6.11) (-1.57) (2.10) 


The value of TR? is 7.66; with 2 degrees of freedom, the 5% critical value of xr is 
5.99, and the 2% critical value is 7.38. As such, we can reject the null hypothesis of no 
remaining GARCH effects. To improve on the small sample properties of the y? test, 
you can test the joint restriction that the coefficients of E and E y equal zero using 
an F-test. The sample value of F is 3.92. With 2 numerator degrees of freedom and 
205 denominator degrees of freedom, the significance level of the test is 2.1%. Again, 
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we reject the null hypothesis of no remaining GARCH effects and try several other 
specifications for the h, equation. 

If you try to capture any remaining GARCH effects by estimating a GARCH(1, 2) 
or a GARCH(2, 1) model, you should find that both the models are unsatisfactory. 
Specifically, the coefficient of é is negative and insignificant in the GARCH(1, 2) 
model and the coefficient of h,_, is negative in the GARCH(2, 1) model. At this point, 
there is an important decision to make. Some researchers might stop at this point and 
settle for the model at hand. This group might be particularly concerned about overfit- 
ting the data since the model of the mean seems reasonable and the h, equation captures 
most of the conditional volatility in a reasonably parsimonious way. Others might go 
on to eliminate any remaining conditional volatility. For our purposes, it is instructive 
to try an ARCH(2) specification as an alternative to the GARCH(, 1) reported above. 
If you estimate the ARCH(2) model, you should find 


s, = 0.307 + 0.586s,_; + 0.1515, + £, + 0.688e,_; — 0.112€,_7 


(6.46) (17.84) (5.49) (21.66) (=2.71) 
h, = 0.115 + 0.071e? | + 0.3877, 
(8.26) (1.16) (3.35) 


Notice that the coefficient on ei has a very small t-statistic. However, it hardly 
makes sense to eliminate this term while retaining the second lagged term. A second 
problem with the ARCH(2) model is that it does not capture all of the conditional 
volatility in the spread. The autocorrelations of the standardized errors and the squared 
standardized errors are given by 


Correlations Pi Pa P3 P4 Ps Po Py Ps 
e,/h?° 0.05 0.09 0.08 0.08 -0.02 —0.09 —0.02 0.04 
e/h, —0.02 —0.04 0.21 0.21 0.01 —0.01 —0.24 —0.08 


Although the correlations of the standardized residuals are small, some of those 
for the standardized squared residuals are rather large. If you perform the formal test 
for remaining GARCH errors, you should obtain 


ê? /h, = 0.67 — 0.068? _,/hy1 — 0.03ê? ,/h,2 + 0.218? ,/hy3 + 0.22ê? ,/hy_4 
(8.09)  (—0.91) (—0.42) (3.07) (3.25) 


Given that the value of TR? is 18.82 and that the F-statistic for the null hypothesis 
that all coefficients on the é terms jointly equal zero is 5.50, we can conclude that 
the ARCH(2) specification is not adequate. In order to capture some of the remaining 
serial correlation in the standardized squared residuals (i.e., the e? /h, series), if you try 
an ARCH(3) model, you should find 


s, = 0.222 + 0.5885,_; + 0.194s,_, +e, + 0.700e,_; — 0.157€,_7 
(5.95) (19.50) (5.25) (25.48) (=5.99) 
h, = 0.069 + 0.068e? | + 0.374e2,, + 0.271e?., 


(6.09) (1.23) (3.91) (2.89) 
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The autocorrelations of the standardized errors and the squared standardized errors 
are given by 


Correlations Py Pa P3 P4 Ps Ps Py Pg 
e,/no> —0.05 0.07 0.04 0.07 —0.05 -0.11 0.04 —0.12 
e/h, —0.07 —0.03 —0.05 0.15 —0.04 —0.03 -0.15 0.03 


You should be able to show that there is no remaining serial correlation. For 
example, if you perform the test for remaining serial correlation with four lags, the 
value of TR? is 6.29 and the sample value of F is 1.58. As such, the ARCH(3) model 
seems to do quite well. At this point, you can compare the GARCH(1, 1) to the 
ARCH(3) model. Both yield reasonable coefficient estimates, although the ARCH(3) 
is superior in that it captures all of the conditional volatility. If we use the model 
selection criteria, the AIC and SBC values for the GARCH(1, 1) model are 247.91 
and 274.68 while those for the ARCH(3) are 243.91 and 274.03, respectively. Thus, 
the ARCH(3) also yields a better fit than the GARCH(1, 1). 

Although the model of the mean implies that the spread is reasonably persistent 
(the sum of the autoregressive coefficients is 0.782), for reasons discussed in Section 10 
of Chapter 2, we do not want to use the first difference of the spread. The solid line 
in Figure 3.8 shows the one-step-ahead forecast of E,s,,,; using the ARMA[2, (1, 7)] 
model with ARCH(3) errors. Since h, is an estimate of the conditional variance of s,, 
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FIGURE 3.8 Forecasts of the Spread 
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(h,41)°> is the standard error of the one-step-ahead forecast. The dashed lines in the 
figure represent a band of +2(h,,,)°° surrounding the one-step-ahead forecast of 5,41. 
In contrast to the assumption of a constant conditional variance, note that the bandwidth 
increases in the late 1970s through the mid-1980s. 


5. A GARCH MODEL OF RISK 


An interesting application of GARCH modeling is provided by Holt and Aradhyula 
(1990). Their theoretical framework stands in contrast to the cobweb model (see 
Section 5 of Chapter 1) in that rational expectations are assumed to prevail in the 
agricultural sector. The aim of the study is to examine the extent to which producers 
in the U.S. broiler (i.e., chicken) industry exhibit risk-averse behavior. To this end, the 
supply function for the U.S. broiler industry takes the form 


qı = Ig + apy — ah, — a3pfeed,_,; + aghatch,_; + a5q;_4 + €1; (3.18) 


where: q, = quantity of broiler production (in millions of pounds) in t 
p; = expected real price of broilers at t conditioned on the information at 
t — 1 (so that p? = E,_1p,) 
h, = expected variance of the price of broilers in ¢ conditioned on the 
information at t — 1 
pfeed,_; = real price of broiler feed (in cents per pound) at ¢ — 1 
hatch,_, = hatch of broiler-type chicks in commercial hatcheries (measured in 
thousands) in period ft — 1 
€; = supply shock in t 


and the length of the time period is one quarter. Note that seasonal dummy variables 
were also included in the model. 

The supply function is based on the biological fact that the production cycle of 
broilers is about 2 months. Since bimonthly data are unavailable, the model assumes 
that the supply decision is positively related to the price expectation formed by produc- 
ers in the previous quarter. Given that feed accounts for the bulk of production costs, 
real feed prices that lagged one quarter are negatively related to broiler production in t. 
Obviously, the hatch available in t — 1 increases the number of broilers that can be mar- 
keted in ¢. The fourth lag of broiler production is included to account for the possibility 
that production in any period may not fully adjust to the desired level of production. 

For our purposes, the most interesting part of the study is the negative effect of 
the conditional variance of price on broiler supply. The timing of the production pro- 
cess is such that feed and other production costs must be incurred before output is sold 
in the market. In the planning stage, producers must forecast the price that will prevail 
2 months hence. The greater the pf, the greater the number of chicks that will be fed and 
brought to market. If price variability is very low, these forecasts can be held with confi- 
dence. Increased price variability decreases the accuracy of the forecasts and decreases 
broiler supply. Risk-averse producers will opt to raise and market fewer broilers when 
the conditional volatility of price is high. 
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In the initial stage of the study, broiler prices are estimated as the AR(4) process: 
(1 = pL- pE? — pL? — ByL"\p, = Bo + Ex (3.19) 


Ljung—Box Q-statistics for various lag lengths indicate that the residual series 
appear to be white noise at the 5% level. However, the Ljung—Box Q-statistic for the 
squared residuals—that is, the { ro ? }— of 32.4 is significant at the 5% level. Thus, Holt 
and Aradhyula conclude that the variance of the price is conditionally heteroskedastic. 

In the second stage of the study, several low-order GARCH estimates of (3.19) 
are compared. Goodness-of-fit statistics and significance tests suggest a GARCH(1, 1) 
process. In the third stage, the supply equation (3.18) and a GARCH(1, 1) process 
are simultaneously estimated. The estimated price equation (with standard errors in 
parentheses) is 


(1 — O.511L — 0.12922 — 0.130L3 — 0.138L4) p, = 1.632 + £; 


(0.092) (0.098) (0.094) (0.073) (1.347) (3.20) 
h, = 1.353 + 0.16265, | + 0.591h,_ (3.21) 
(0.747) (0.80) (0.175) 


Equations (3.20) and (3.21) are well-behaved in that (1) all estimated coefficients 
are significant at conventional significance levels; (2) all coefficients of the conditional 
variance equation are positive; and (3) the coefficients all imply convergent processes. 

Holt and Aradhyula assume that producers use (3.20) and (3.21) to form their price 
expectations. Combining these estimates with (3.18) yields the supply equation 


q, = 2.767 pf — 0.521h, — 4.325pfeed,_, + 1.887hatch,_, + 0.6039,_4 + Eir 
(0.585) (0.344) (1.463) (0.205) (0.065) 


All estimated coefficients are significant at conventional levels and have the 
appropriate sign. An increase in the expected price increases broiler output. Increased 
uncertainty, as measured by conditional variance, acts to decrease output. This 
forward-looking rational expectations formulation is at odds with the more traditional 
cobweb model discussed in Chapter 1. In order to compare the two formulations, 
Holt and Aradhyula (1990) also considered an adaptive expectations formulation (see 
Exercise 2 in Chapter 1). Under adaptive expectations, price expectations are formed 
according to a weighted average of the previous period’s price and the previous 
period’s price expectation: 


P; =ap,,+U—a)pt, 


or, solving for p? in terms of the {p,} sequence, we obtain 


oo 
Py = a), (1 = aP,- 
i=0 


Similarly, the adaptive expectations formulation for conditional risk is given by 


h, = PF 0- PPri (3.22) 


i=0 


J is the forecast-error variance for period t — i. 


where 0 < £ < 1 and (p,_)_; — P{_1- 
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Note that, in (3.22), the expected measure of risk as viewed by producers is not 
necessarily the actual conditional variance. The estimates of the two models differ con- 
cerning the implied long-run elasticities of supply with respect to expected price and 
conditional variance. Respectively, the estimated long-run elasticities of supply with 
respect to expected price are 0.587 and 0.399 in the rational expectations and adap- 
tive expectations formulations. Similarly, rational and adaptive expectations formula- 
tions yield long-run supply elasticities of conditional variance of —0.030 and —0.013, 
respectively. Not surprisingly, the adaptive expectations model suggests a more slug- 
gish supply response than does the forward-looking rational expectations model. 


6. THE ARCH-M MODEL 


Engle, Lilien, and Robins (1987) extended the basic ARCH framework to allow the 
mean of a sequence to depend on its own conditional variance. This class of model, 
called the ARCH in mean (ARCH-M) model, is particularly suited to the study of asset 
markets. The basic insight is that risk-averse agents will require compensation for hold- 
ing a risky asset. Given that an asset’s riskiness can be measured by the variance of 
returns, the risk premium will be an increasing function of the conditional variance of 
returns. Engle, Lilien, and Robins express this idea by writing the excess return from 
holding a risky asset as 


Vy = Hy tE, (3.23) 


where: y, = excess return from holding a long-term asset relative to a one-period 
treasury bill 
H, = risk premium necessary to induce the risk-averse agent to hold the 
long-term asset rather than the one-period bond 
€, = unforecastable shock to the excess return on the long-term asset 


To explain (3.23), note that the expected excess return from holding the long-term 
asset must be just equal to the risk premium: 
EY; = My 


Engle, Lilien, and Robins assume that the risk premium is an increasing function 
of the conditional variance of €,; in other words, the greater the conditional variance 
of returns, the greater the compensation necessary to induce the agent to hold the 
long-term asset. Mathematically, if h, is the conditional variance of €,, the risk premium 
can be expressed as 


H,=fP+6h, ô>0 (3.24) 
where h, is the ARCH(q) process: 


q 
h, =a) + DY) ae? ; (3.25) 
i=l 
As a set, equations (3.23), (3.24), and (3.25) constitute the basic ARCH-M model. 


From (3.23) and (3.24), the conditional mean of y, depends on the conditional vari- 
ance h,. From (3.25), the conditional variance is an ARCH(q) process. It should be 
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pointed out that, if the conditional variance is constant (i.e., if aj =a, =---= a, = 
0), the ARCH-M model degenerates into the more traditional case of a constant risk 
premium. 


Figure 3.9 illustrates two different ARCH-M processes. Panel (a) of the figure 
shows 60 realizations of a simulated white-noise process denoted by {e€,}. Note the 
temporary increase in volatility during periods 20—30. By initializing £ọ = 0, the con- 
ditional variance was constructed as the first-order ARCH process: 


h, = 1 + 0.65€? , 


As you can see in Panel (b), the volatility in {£,} translates into increases in con- 
ditional variance. Note that large positive and negative realizations of €,_, result in a 
large value of h,; it is the square of each {€,} realization that enters the conditional 
variance. In Panel (c), the values of J and 6 are set equal to —4 and +4, respectively. As 
such, the y, sequence is constructed as y, = —4 + 4h, + €,. You can clearly see that y, is 
above its long-run value during the period of volatility. In the simulation, conditional 
volatility translates itself into increases in the values of {y,}. In the latter portion of 
the sample, the volatility of {€,} diminishes, and the values y3, through ye, fluctuate 
around their long-run mean. 

Panel (d) reduces the influence of ARCH-M effects by reducing the magnitude of 
6 and p (see Exercise 4). Obviously, if ô = 0, there are no ARCH-M effects at all. As 
you can see by comparing the two lower graphs, y, more closely mimics the £, sequence 
when the magnitude of 6 is diminished from 6 = 4 to 6 = 1.4 
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FIGURE 3.9 Simulated ARCH-M Processes 
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As in any ARCH or GARCH model, a Lagrange multiplier test can be used to 
detect the presence of conditional volatility. The LM tests are relatively simple to 
conduct since they do not require estimation of the full model. The statistic TR? is 
asymptotically distributed as y? with degrees of freedom equal to the number of 
restrictions. 


Implementation 


Using quarterly data from 1960Q1 to 1984Q2, Engle, Lilien, and Robins (1987) con- 
structed the excess yield on 6-month treasury bills as follows. Let r, denote the quarterly 
yield on a 3-month treasury bill held from ¢ to (t + 1). Rolling over all proceeds, at the 
end of two quarters, an individual investing $1 at the beginning of period t will have 
(+r) +r) dollars. In the same fashion, if R, denotes the quarterly yield on a 
6-month treasury bill, buying and holding the 6-month bill for the full two quarters 
will result in (1 + Ry dollars. The excess yield, y,, due to holding the 6-month bill is 


y,= (+R) -C4+ ry) +r) 
which is approximately equal to 
Y, = 2R, Ti — 1 


The results from regressing the excess yield on a constant are as follows, with the 
t-statistic in parentheses: 
y, = 0.142 + £, 
(4.04) 


The excess yield of 0.142% per quarter is more than four standard deviations from 
zero. The problem with this estimation method is that the post-1979 period showed 
markedly higher volatility than the earlier sample period. To test for the presence of 
ARCH errors, the squared residuals were regressed on a weighted average of past 
squared residuals, as in (3.11). The LM test for the restriction a, = 0 yields a value 
of TR? = 10.1, which has a y? distribution with 1 degree of freedom. At the 1% sig- 
nificance level, the critical value of xX with 1 degree of freedom is 6.635; hence, there 
is strong evidence of heteroskedasticity. Thus, there appear to be ARCH errors; as a 
result, (3.26) is misspecified if individuals demand a risk premium. 

The maximum likelihood estimates of the ARCH-M model and the associated 
t-statistics are 


y, = —0.0241 + 0.687h, + €, 
(-1.29) (5.15) 

h, = 0.0023 + 1.64(0.4e?_, + 0.3€?_, + 0.2? , +0.le? ,) 
(1.08) (6.30) 


The estimated coefficients imply a time-varying risk premium. The estimated 
parameter of the ARCH equation of 1.64 implies that the unconditional variance is 
infinite. Although this is troublesome, the conditional variance is finite. Shocks to €,_; 
act to increase the conditional variance so that there are periods of tranquility and 
volatility. During volatile periods, the risk premium rises as risk-averse agents seek 
assets that are conditionally less risky. 


(3.26) 
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Exercise 6 asks you to estimate such an ARCH-M model using simulated data. 
The questions are designed to guide you through a typical estimation procedure. 


7. ADDITIONAL PROPERTIES 
OF GARCH PROCESSES 


Whenever you estimate a GARCH process, you will be estimating the two interrelated 
equations 


y, =a) + Bx, + €, 


and 
E, = VC + ayer ++ + gE; + Pihi +++ + ByM_p)”? (3.27) 


where x, can contain exogenous variables and/or an ARMA process of order (p”, q”). 

The first equation is a model of the mean and the second yields the model of the 
variance. The symbols p” and q” are used to denote that the order of the ARMA process 
for the mean need not equal the order of the GARCH(p, q) equation. The two equations 
are related in that h, is the conditional variance of €,; hence, the GARCH process of 
(3.27) is the conditional variance of the mean equation. Do not make the mistake of 
assuming that £? is the conditional variance itself. Given that £, = v,(h,)°°, it follows 
that the relationship between h, and €? is 


2.2 
& = vh, 


and, since Ev? = Ev = 1, 


E,-1€ : =h, 
Thus, h, is the conditional variance of the {€,} sequence. 
A GARCH(1, 1) specification is the most popular form of conditional volatility. 
This is especially true for financial data where volatility shocks are very persistent. As 
such, it is worthwhile to pay special attention to this form of GARCH process. 


Properties of GARCH(1, 1) Error Processes 


Given the large number of GARCH(1, 1) models found in the literature, it is desir- 
able to establish the properties of this particular type of error process. In doing so, we 
can generalize some of the discussion of ARCH(1) models presented in Section 2. If 
you take the conditional expectation of the GARCH(1, 1) process, you should have no 
trouble verifying that 

E,y€7 = o + aye? + Piha 


or 
h, = ag + a6? + Pihi (3.28) 


The mean of €,: The unconditional mean of £, is zero. If you take the expected value 
of (3.27), you obtain 
Ee, = E[v,h,)'/"] 


Since h, does not depend on v, and Ev, = 0, it immediately follows that Fe, = 0. 
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The variance of €,: Since 
a = AC + aE? + Pih) 
it follows that the unconditional variance of a GARCH(1, 1) process is 
Ee? = Ev? (ay + a} Ee? | + p1Eh, 1) (3.29) 


We can simplify this expression if we recognize that Ev? = | and Ee? = Eh,_,. 
This second relationship follows from the law of iterated expectations. The form 
of the law we need guarantees that Ee? = E(E,_,€7). In essence, the unconditional 
expectation of the conditional variance is just the unconditional variance. As such, 
we can lag the relationship one period and write Ee? , = EE, ey so that Ee? , = 
E(h,_,). If we substitute this condition into (3.29), it follows that 


Es? =a) +(a, + pE? 


Since the unconditional variances are such that Ee? = Ee? , the solution for the 


unconditional variance is clear. Given that a, + £, < 1, the unconditional variance is 
2 
Ee? = ao/(1 — a, — pı) 


For the more general GARCH(p, q) model, it follows that the variance will be 
finite if 
q p 


is ai- 9 6 >0 


i=1 i=1 


The autocorrelation function: The autocorrelations Ee,€,_; are all equal to zero. 
Consider 


Ee,€,_; = Elv, (h) Pv A) 


Since h,, v,_;, and h,_; do not depend on the value of v, and Ev, = 0, it follows that 
all autocorrelations are zero for j + 0. 


The conditional variance: The conditional variance of the error process is h,. 
Consider 


2o 2 _ 
E,-1€; = E,1v;h, = h, 


This simple result is the essential feature of GARCH modeling. The conditional 
variance of the error process is not constant. With the appropriate specification of the 
parameters of h,, it is possible to model and forecast the conditional variance of the 
{y,} process. 

Volatility persistence: In a GARCH process, the errors are uncorrelated in that 
EE,€,_; = 0. However, as shown in (3.28), the squared errors of a GARCH(1, 1) 
process are correlated. You should be able to show that the degree of autoregressive 
decay of the squared residuals is (a, + f). In fact, the ACF of the squared residuals of 
a GARCH(1, 1) process tends to behave like that of an ARMA(1, 1) process. 
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Large values of both a, and f; act to increase the conditional volatility but they do 
so in different ways. The larger is a, the larger is the response of h, to new information; 
clearly, if a, is large, a v, shock has a sizable effect on £? and h,,,;. To illustrate the point, 
two GARCH(1, 1) processes were simulated using the identical set of random numbers 
for the {v,} sequence. In both cases, hy and €g were initialized, and the remaining values 
of the series were constructed using the relationship E? = veh, and 


Model 1: h, =1+0.6e>_, +0.2h, 
Model 2: h, = 1 +0.2e?_ + 0.6h,_; 


In order to avoid the effect of selecting the specific values for the initial conditions, 
the first 100 realizations were eliminated; the remaining 250 realizations are shown in 
Figure 3.10. Given the value of h,, a large v, shock has its immediate effect on £?. Since 
Model 1 has a larger value of a@,, the effect of this shock is very pronounced in period 
t+ 1. For Model 2, a, is equal to only 0.2 so that peaks in the {h,} series are not as 
large as those from Model 1. However, since Model 2 has the larger value of p4, its 
conditional variance displays more autoregressive persistence. 

Also note that the value of a, must be strictly positive. Hence, the analogy between 
the ACF of the squared residuals of a GARCH(1, 1) and the ACF of the residuals from 
an ARMA(1, 1) process is not perfect. If a, = 0, it is possible to write (3.28) as 


h, =a + Pih; 


so that there is no way for the {e€,} series to affect the {h,} series. As such, the model 
for the conditional variance cannot be identified. The analogy is even less clear in 
the more general case of a GARCH(p, q) process. Bollerslev (1986) proves that the 
ACF of the squared residuals resulting from a GARCH(p, q) process acts like that of 
an ARMA(m, p) process where m = max(p, q). This makes identification of the most 
appropriate values of p and g somewhat difficult. Question 3 at the end of the chapter 
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FIGURE 3.10 Persistence in the GARCH(1,1) Model 
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guides you through a proof that the ACF of a GARCH(2, 1) has the same properties as 
the ACF of a GARCH(2, 2) model. 


Assessing the Fit 


One way to assess the adequacy of a GARCH model is to see how well it fits the 
data. It is now standard to assess the fit of a GARCH model using model selection 
criteria such as the AIC and SBC discussed in Chapter 2. First consider the sum of 
squared residuals (SSR) as a measure of the goodness of fit. Since SSR = De, the sum 
of the squared residuals actually measures squared deviations of the model of the mean. 
Moreover, since £, = v,(h,)'/*, the pure innovations in the GARCH model are given by 
the v, sequence. Instead of using SSR, in a GARCH model, a reasonable measure of 
the goodness of fit is the sum of squares of the {v,} sequence 
T 


SSR’ = È v? 


t=1 


Given that £, = v,(h,)'/*, you can also write SSR’ as 


T: 
SSR’ = È (e?/h,) (3.30) 
t=1 


The point is that SSR’ is a measure of the squared errors relative to the fitted values 
of the conditional variance. Since SSR’ will be small if the fitted values of h, are close 
to E, you can select the model that yields the smallest value of SSR’. Another way to 
make the same point is to recognize that £,/ nos is a standardized residual in that the 
value of £, is divided by its conditional standard error. Hence, SSR’ measures the sum 
of squares of the standardized residuals. 

Another goodness-of-fit measure is simple, the maximized value of the likelihood 
function. As explained in more detail in Section 8, if you assume that the error process 
is normal, the maximized value of the log likelihood function can be written such that 

T 
2InL= -$ inh) +E? /h] — T In(2z) 


t=1 


T 
= -$ inh) +v?] — T In(27) 
t=1 
where L = maximized value of the likelihood function. 
Hence, models with a large value of L will tend to have small values of h, and/or 
small values of SSR’. Notice that L does not include a penalty for the estimation of 
additional parameters. However, you can construct the AIC and SBC using 


AIC = —2InL + 2n 
SBC =—2InL+nIn(T) 
where L is defined above and n is the number of estimated parameters. As discussed in 


Chapter 2, some programmers will not incorporate the expression —T In(2z) into the 
calculation of the likelihood function when reporting model-selection criteria. 


150 CHAPTER3 MODELING VOLATILITY 


Diagnostic Checks for Model Adequacy 


In addition to providing a good fit, an estimated GARCH model should capture all 
dynamic aspects of the model of the mean and the model of the variance. The esti- 
mated residuals should be serially uncorrelated and should not display any remaining 
conditional volatility. You can test to ensure that your model has captured these proper- 
ties by standardizing the residuals as indicated above. Simply divide ê, by i !? in order 
to obtain an estimate of what we have been calling the {v,} sequence. Since £, has a 
zero mean and a variance of h,, you can think of v, = €,/(h,)'/? as the standardized 
value of €,. The resulting series, which we will call s,, should have a mean of zero and 
a variance of unity. 

If there is any serial correlation in the {s,} sequence, the model of the mean is not 
properly specified. To test the model of the mean, form the Ljung—Box Q-statistics for 
the {s,} sequence. You should not be able to reject the null hypothesis that the various 
Q-statistics are equal to zero. 

To test for remaining GARCH effects, form the Ljung—Box Q-statistics of the 
squared standardized residuals (i.e., a7), The basic idea is that 3 is an estimate of 
e? /h, = v2. Hence, the properties of the s sequence should mimic those of v2. If there 
are no remaining GARCH effects, you should not be able to reject the null hypothesis 
that the sample values of the Q-statistics are equal to zero. Otherwise, there is remain- 
ing conditional volatility. If you assumed normality, you should check to determine 
whether the estimated {v,} series actually follows a normal distribution. 

Once you have obtained a satisfactory model, you can forecast future values of 
y, and its conditional variance. Moreover, you can place confidence bands around the 
forecast using the estimates of conditional standard deviation. Since Ee, = h4 a 
two-standard deviation confidence interval for your forecast can be constructed using 


E Y1 + (lhi jea 


The result is quite general; since the mean of each value of {€,} is zero, the optimal 
j-step-ahead forecast of y,; does not depend on the presence of GARCH errors. How- 
ever, the size of any confidence interval surrounding the forecasts does depend on the 
conditional volatility. Clearly, in times when there is substantial conditional volatility 
(i.e., when h,,, is large), the variance of the forecast error will be large. Simply put, we 
cannot be as confident of our forecasts in periods when conditional volatility is high. 


Forecasting the Conditional Variance 


The one-step-ahead forecast of the conditional variance is easy to obtain. If we update 
h, by one period, we find 
2 
hı = Aq + ME; + Pih; 


Since e? and h, are known in period f, the one-step-ahead forecast is simply æg + 
aE? + pih, It is only somewhat more difficult to obtain the j-step-ahead forecasts. To 
begin, use the fact that e? = veh, so that ee = vagja +j: If you update by j periods and 
take the conditional expectation of each side, it should be clear that 


2 op 
EEr = EOV ayha) 
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Since v, + is independent of h,,; and EV = |, it follows that 


t+j 


E£ = Ele; (3.31) 


We can use (3.31) to obtain the forecasts of the conditional variance of the 
GARCH(1, 1) process. Update (3.28) by j periods to obtain 
hij = ay + ye + By Ny j-1 
and take the conditional expectation 


2 
Elis; = a + a EE + BE My 5-1 


If you combine this relationship with (3.31), it is easy to verify that 
E haj =a) + (& + BE hgj- (3.32) 


Thus, (3.32) can be viewed as a first-order difference equation in the E,h,,; 
sequence with the initial condition for h,. Given h,, we can use (3.32) to forecast all 
subsequent values of the conditional variance as 


Eh; = all + (ay + By) + (ay + By? +--+ Ca + BY) + Ca + Bh, 
If a, +2; < 1, the conditional forecasts of h,,; will converge to the long-run value 
Eh, = a/(1 — a, — pı) 
Similarly, we can forecast the conditional variance of the ARCH(q) process 


h, = ao + aE? +++ HAE g (3.33) 


If we update (3.33) by one period, we obtain 


_ BP ot 2 
Nyy, = My FE, Fe H AGE ayy 


As mentioned above, at period t, we have all of the information necessary to calcu- 
late the value of h,,.,; for any GARCH process. Now, if we update (3.33) by two periods 
and take the conditional expectation, we obtain 


at tage? 


E ship = @o + OEE ` q®t—-q+2 


Since Ee = h,,1, it follows that 


= 2 
Eh = A + aih te + Es _a49 


The point is that it is possible to obtain the j-step-ahead forecasts of the conditional 
variance recursively. As the value of j > oo, the forecasts of h,,; should converge to 
the unconditional mean 


Ee} = ag/(1 — a -a -+ — a) 


It should be clear that a necessary condition for convergence is for the roots 
of the inverse characteristic equation 1—a,L—---—a,L’ to lie outside the unit 
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circle. This is a necessary condition for the long-run mean to have the representation 
aj/(1 — Za;). To ensure that the variance is always positive, we also require that 
ay > Oanda; > 0 fori > 1. 

You should have no trouble generalizing these results to the general GARCH(p, q) 
process. Fortunately, most statistical software packages can perform these calculations 
automatically. 


8. MAXIMUM LIKELIHOOD ESTIMATION 
OF GARCH MODELS 


Many software packages contain built-in routines that estimate GARCH and ARCH-M 
models such that the researcher simply specifies the order of the process and the com- 
puter does the rest. Even if you have access to an automated routine, it is important 
to understand the numerical procedures used by your software package. Other pack- 
ages require user input in the form of a small optimization algorithm. This section 
explains the maximum likelihood methods required to understand and write a program 
for GARCH-type models. 

Suppose that values of {€,} are drawn from a normal distribution having a mean 
of zero and a constant variance o. From standard distribution theory, the likelihood of 


any realization of £, is 
2 
—E 
L= : exp| — 
V 2207 20° 


where L, is the likelihood of €,. 

Since the realizations of {£,} are independent, the likelihood of the joint realiza- 
tions of £4, E2, ..., Er is the product in the individual likelihoods. Hence, if all have 
the same variance, the likelihood of the joint realizations is 


T 1 _¢2 
L= exp| — 
M) (=) 


It is far easier to work with a sum than with a product. As such, it is convenient to 
take the natural log of each side so as to obtain 


T 
T T > 1 : 

lnL = —=In(2z)- =1 — — z 3.34 

n 5 n(2z) 5 Ine TRAG (3.34) 


The procedure used in maximum likelihood estimation is to select the distribu- 
tional parameters so as to maximize the likelihood of drawing the observed sample. 
In Appendix 1 of Chapter 2, we considered the case where the {€,} sequence was an 
MA(1) process. In the example at hand, suppose that {€,} is generated from the model: 


E = y, — Bx, (3.35) 


In the classical regression model, the mean of £, is assumed to be zero, the variance 
is the constant o”, and the various realizations of {€,} are independent. Using a sample 
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with T observations, we can substitute (3.35) into the log-likelihood function given by 
(3.34) to obtain 


T 
_ TT T 2 1 2 
InL= “3 In(2zr) — z Ino“ — 29? (y, — Bx,) (3.36) 


Maximizing the log-likelihood function (3.36) with respect to ø? and f yields 


T 

dolnL T 1 2 
=- += 0 

ðo? 20° -20° i 


and 


ae Fon- px) (3.37) 


Setting these partial derivatives equal to zero and solving for the values of o? and 
p, which yield the maximum value of In L result in the familiar OLS estimates of the 
variance and f (denoted by f and 67). Hence, 


a=) ef /T (3.38) 


p= we (3.39) 


All of this should be familiar ground since most econometric texts concerned with 
regression analysis discuss maximum likelihood estimation. The point to emphasize 
here is that the first-order conditions are easily solved since they are all linear. Calcu- 
lating the appropriate sums may be tedious, but the methodology is straightforward. 
Unfortunately, this is not the case in estimating an ARCH or GARCH model since the 
first-order equations are nonlinear. Instead, the solution requires some sort of search 
algorithm. The simplest way to illustrate the issue is to introduce an ARCH(1) error 
process into the regression model given by (3.35). Continue to assume that £, is the 
error term in linear equation y, — Px, = €,. Now let £, be given by 


and 


eS v (h)? 


Although the conditional variance of £, is not constant, the necessary modification 
of (3.34) is clear. Since each realization of €, has the conditional variance h,, the joint 
likelihood of realization £, through ep is 


e-a) E) 


so that the log-likelihood function is 


InL=-= = In) ai, s5 Inh, — 0. sSe/h) 


t=1 t=1 
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Now suppose that £, = y, — px, and that the conditional variance is the ARCH(1) 
process h, = a) +a cae Substituting for h, and y, yields 


T F 
In(27) — 0.5 È, In(ao + aE?) — Dlo, — px)? (ay + aE )] 


t=2 t=2 


iea 


Note that the initial observation is lost since €g is outside the sample. Once you 
substitute (y,_; — Bx,_,)* for E ,> It is possible to maximize ln L with respect to a, 
a,, and p. As you can surmise, there are no analytic solutions to the first-order con- 
ditions for a maximum. Fortunately, computers are able to select the parameter values 
that maximize this log-likelihood function. In most time-series software packages, the 
procedure necessary to write such programs is quite simple. Be aware that numerical 
optimization routines cannot guarantee exact solutions for the estimated coefficients. 
Instead, various “hill-climbing” methods are used to find the parameter values that 
maximize In L. If the partial derivatives of the likelihood function are close to zero 
(so that the likelihood function is flat), the algorithms may not be able to find a max- 
imum. Section 4.4 of the Programming Manual discusses the Simplex algorithm and 
the so-called BFGS algorithm often used in the maximum likelihood estimates of a 
GARCH model. 


9. OTHER MODELS OF CONDITIONAL VARIANCE 


Financial analysts are especially keen to obtain precise estimates of the conditional 
variance of an asset price. Since GARCH models can forecast conditional volatility, 
they are able to measure the risk of an asset over the holding period. As such, a number 
of extensions of the basic GARCH model have been developed that are especially suited 
to estimating the conditional volatility of financial instruments. 


The IGARCH Model 


In financial time series, the conditional volatility is typically quite persistent. In fact, 
if you estimate a GARCH(1, 1) model using a long time series of stock returns, you 
will find that the sum of œ; and J, is very close to unity. Nelson (1990) argued that 
constraining @, + f; to equal unity can yield a very parsimonious representation of the 
distribution of an asset’s return. In some respects, this constraint forces the conditional 
variance to act like a process with a unit root. This integrated-GARCH (IGARCH) 
specification has some very interesting properties. From (3.32), if a, + p; = 1, the 
one-step-ahead forecast of the conditional variance is 


E,hiy, = a +h, 
and the j-step-ahead forecast is 
Elis; = ja + h, 


Thus, except for the intercept term a, the forecast of the conditional variance for 
the next period is the current value of the conditional variance. Moreover, the uncondi- 
tional variance is clearly infinite. Nevertheless, Nelson (1990) showed that the analogy 
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between the IGARCH process and an ARIMA process with a unit root is not perfect. 
Given that a, + p4 = 1 and that h,; = Lh,, we can write the conditional variance as 


h, = a + (1 — By )e?_, + BLh, 


and solving for h, 


h, = d/(1— By) +1 - By) DY Bie? 
i=0 
Thus, unlike a true nonstationary process, the conditional variance is a geometri- 
cally decaying function of the current and past realizations of the { e?) sequence. As 
such, an IGARCH model can be estimated like any other GARCH model. 


Models with Explanatory Variables 


Just as the model of the mean can contain explanatory variables, the specification of 
h, also allows for exogenous variables. In Section 4, the example concerning the Great 
Moderation used a dummy variable in the conditional variance equation. Similarly, 
suppose that you want to determine whether the terrorist attacks of September 11, 2001, 
increased the volatility of asset returns. One way to accomplish the task would be to 
create a dummy variable D, equal to 0 before September 2011 and equal to | thereafter. 
Consider the following modification of the GARCH(1, 1) specification 


h, = a+ QE + Pih + yD, 


If it is found that y > 0, it is possible to conclude that the terrorist attacks increased 
the mean of the conditional volatility. 


Models with Asymmetry: TARCH and EGARCH 


An interesting feature of asset prices is that “bad” news seems to have a more pro- 
nounced effect on volatility than does “good” news. For many stocks, there is a strong 
negative correlation between the current return and the future volatility. The reasoning 
is that a negative stock price shock reduces the value of a firm’s equity relative to its 
debt. As the debt-to-equity (i.e., leverage) ratio rises, the riskiness of holding the firm’s 
stock will rise as well. This tendency for volatility to decline when returns rise and to 
rise when returns fall is often called the leverage effect. The idea of the leverage effect 
is captured in Figure 3.11, where “new information” is measured by the size of e,. If 
E€, = 0, expected volatility (E,h,,,) is the distance Oa. Any news increases volatility; 
however, if the news is good (i.e., if e, is positive), volatility increases along ab. If the 
news is bad, volatility increases along ac. Since segment ac is steeper than ab, a posi- 
tive e, shock will have a smaller effect on volatility than a negative shock of the same 
magnitude. 

Glosten, Jaganathan, and Runkle (1994) showed how to allow the effects of good 
and bad news to have different effects on volatility. In a sense, €,_,; = 0 is a threshold 
such that shocks greater than the threshold have different effects than shocks below the 
threshold. Consider the threshold-GARCH (TARCH) process 


2 2 
h, = a + aE | + Aid) 1E + Pihi 
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FIGURE 3.11 The Leverage Effect 


where d,_, is a dummy variable that is equal to one if €,_, < 0 and is equal to zero if 
€,, 2 0. 

The intuition behind the TARCH model is that positive values of €,_, are associated 
with a zero value of d,_,. Hence, if €,_, > 0, the effect of an €,_, shock on h, is QE. 
When £, < 0, d,_, = 1, and the effect of an €,_, shock on h, is (a, + Aye? | If 
A, > 0, negative shocks will have larger effects on volatility than positive shocks. You 
can easily create a dummy variable d, and the product d, 1€% ,- If the coefficient A, is 
statistically different from zero, you can conclude that your data contain a threshold 
effect. 

Another model that allows for the asymmetric effect of news is the exponential- 
GARCH model. One problem with a standard GARCH model is that it is necessary 
to ensure that all of the estimated coefficients are positive. Nelson (1991) proposed a 
specification that does not require nonnegativity constraints. Consider 


In(h,) = æo + a (€,)/AO) + Ay le, /AO | + By In(h,_1) (3.40) 


Equation (3.40) is called the exponential-GARCH or EGARCH model. There are 
three interesting features to notice about the EGARCH model: 


1. The equation for the conditional variance is in log-linear form. Regardless of 
the magnitude of In(/,), the implied value of h, can never be negative. Hence, 
it is permissible for the coefficients to be negative. 

2. Instead of using the value of ae the EGARCH model uses the level of stan- 
dardized value of €,_, [i-e., €,_; divided by (h,_,)°°]. Nelson argues that 
this standardization allows for a more natural interpretation of the size and 
persistence of shocks. After all, the standardized value of €,_, is a unit-free 
measure. 

3. The EGARCH model allows for leverage effects. If €,_,/(h,_)°° is positive, 
the effect of the shock on the log of the conditional variance is a, + A,. If 
E,-1/(h,_1)°> is negative, the effect of the shock on the log of the conditional 
variance is —a, + A). 
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Although the EGARCH model has some advantages over the TARCH model, it is 
difficult to forecast the conditional variance of an EGARCH model. For the TARCH 
model, it makes sense to assume that F,d,,; = 0.5. If asset returns are symmetric, there 
is a 50:50 chance that the realized value of €,,; will be positive. 


Testing for Leverage Effects 


One way to test for leverage is to estimate the TARCH or EGARCH model and perform 
a t-test for the null hypothesis Â 1 = 0. However, there is a specific diagnostic test that 
allows you to determine whether there are any leverage effects in your residuals. After 
you estimate an ARCH or GARCH model, form the standardized residuals 


sA jpl/2 
= é,/h, 


Thus, the {s,} sequence consists of each residual divided by its standard deviation. 
To test for leverage effects, estimate a regression of the form 


s? = do + 4)S,_1 aS T° °° 

If there are no leverage effects, the squared errors should be uncorrelated with 
the level of the error terms. Hence, you can conclude that there are leverage effects if 
the sample value of F for the null hypothesis a, = a, = --- exceeds the critical value 
obtained from an F-table. 

Engle and Ng (1993) developed a second way to determine whether positive and 
negative shocks have different effects on the conditional variance. Again, let d,_, be a 
dummy variable that is equal to 1 if €,_, < 0 and is equal to zero if €,_, = 0. The test is 
to determine whether the estimated squared residuals can be predicted using the {d,_, } 
sequence. The sign bias test uses the regression equation of the form 


A = y + aid, i Eji 
where £€,, is a regression residual. 

If a t-test indicates that a, is statistically different from zero, the sign of the current 
period shock is helpful in predicting the conditional volatility. To generalize the test, 
you can estimate the regression 


2 
Sp = Ay + ay d,_| + Ayd,_ 15,1 +.43(1 — dy_1)5,_1 + Eir 


The presence of d,_,5,., and (1 —d,_))s,_; is designed to determine whether 
the effects of positive and negative shocks also depend on their size. You can use an 
F-statistic to test the null hypothesis a, = a, = a3 = 0. If you conclude that there is a 
leverage effect, you can estimate a specific form of the TARCH or EGARCH model. 


Nonnormal Errors 


For most financial assets, the distribution function for the rate of return is fat tailed. A 
fat-tailed distribution has more weight in the tails than a normal distribution. Suppose 
that the rate of return on a particular stock has a higher probability of a very large loss (or 
gain) than indicated by the normal distribution. As such, you might not want to perform 
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—— Normal distribution ---- tdistribution 


FIGURE 3.12 Comparison of the Normal and t-Distributions (3 degrees of freedom) 


a maximum likelihood estimation using a normal distribution. Figure 3.12 compares 
the standardized normal distribution to a f¢-distribution with 3 degrees of freedom. You 
can see that the f-distribution has more clustering near the mean and far larger tails than 
the normal distribution. Since many financial variables have fat tails, many computer 
packages allow you to estimate a GARCH model using a f-distribution.° 


10. ESTIMATING THE NYSE U.S. 100 INDEX 


We can illustrate the process of fitting a GARCH model to financial data by using the 
logarithmic change in the NYSE Index of 100 U.S. Stocks shown in Figure 3.3. The 
index is broad based in that the listed firms comprise more than 81% of the capitalized 
value of the entire U.S. market and over 87% of the total company revenues. You can 
follow along using the series labeled RATE in the data set NYSE(RETURNS).XLS. 
The series consists of the total daily returns of the index over the period January 4, 
2000, through July 16, 2012.° The series is a good candidate to be aGARCH process; 
you can clearly see periods in which there are only small changes in the series (such as 
the 2004—2006 period) and others (such as in 2008) where there are clusters of large 
increases and decreases in the index. 

In Section 4, the main focus of the example of the interest rate spread was to 
estimate a model of the mean and to estimate the appropriate conditional confidence 
intervals. Here, the model of the mean is of little interest. Asset prices tend to behave 
as random walks or as random walks with a drift term. For this reason, there is lit- 
tle informational content in the model of the mean. Instead, our goal is to accurately 
capture the behavior of the conditional volatility. Accurately modeling the conditional 
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variance requires a large number of observations. Moreover, since financial data are 
readily available, GARCH models of asset prices typically use large data sets. 


The Model of the Mean 


The first step in modeling any GARCH process is to estimate the model of the mean. 
Since the level of the index is clearly nonstationary, the daily rate of return on the 
index was constructed as the percentage change in the closing NYSE’s total RETURN 
measure. For weekdays on which the exchange was closed, the value of the previous 
day’s RETURN was used. If you examine the file, you will see that the daily rate of 
return (RATE or r,), was constructed as 


r, = 100* In(RETURN,/RETURN,._;) 


The 3270 observations in the {r,} series have a mean value of 0.003 and a sam- 
ple variance of 1.637. The solid line in Figure 3.13 shows the actual distribution of 
these 3270 observations superimposed on the normal and f-distributions plotted in 
Figure 3.12. You can see that the distribution of returns is more peaked than the normal 
and t-distributions. Moreover, the tails are a bit fatter than a normal distribution but not 
as fat as a f-distribution with 3 degrees of freedom. Overall, it makes sense to estimate 
the {r,} series using a ¢-distribution along with the degrees of freedom parameter. We 
should anticipate that the estimated degrees of freedom parameter will be more than 
three. Most professional software packages can estimate a GARCH process using a 
t-distribution. You do not need to specify the degrees of freedom since it can be esti- 
mated along with the other parameters of the model. Since the f-distribution approaches 
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FIGURE 3.13 Returns of the NYSE Index of 100 Stocks 
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the normal as the degrees of freedom increase, a large value for the degrees of freedom 
estimate indicates that the series appears to be normally distributed. 

Although the autocorrelations of the {r,} sequence are all very small, with such a 
large number of observations, several appear to be statistically significant. For example, 
pı = —0.090 and p, = —0.050. Since 2(3270)-!/? = 0.035, both of these correlations 
are significant at the 5% level. The first choice to make concerns the model of the 
mean since the AIC selects an AR(2) model, whereas the SBC selects an MA(1) model. 
Consider the AR(2) model estimated over the entire sample period 


r, = 0.0040 + £, — 0.0946r,_, — 0.0575r,_, 


(0.209) (—5.42) (—3.29) (3.41) 


Notice that it is possible to eliminate the intercept term from the regression since 
the t-statistic is quite low. Nevertheless, there are advantages to using regressions con- 
taining intercept terms. Also, as the f-statistics can change as we posit different models 
for the conditional variance, the intercept will be included in the model of the mean. 
Once we have found the most appropriate GARCH representation for h,, we can con- 
sider reestimating a model without an intercept term. If you check the ACF of the 
residuals, all are insignificant except for the fact that p; and pg equal —0.045. 


Testing for GARCH errors 


Given that the model of the mean is satisfactory, we can test for GARCH errors by using 
the squared residuals of (3.41). The ACF of the squared residuals is such that p} = 
0.20, p) = 0.41, p, = 0.20, p4 = 0.29, and p; = 0.33. The Q-statistics formed using 
the correlations of the squared residuals are significant at conventional levels, implying 
strong evidence of GARCH errors. It is also possible to test for the presence of GARCH 
errors using a Lagrange multiplier test. Let é? denote the squared residuals from (3.41). 
If we use five lags of the ê series (since there are five workdays in a week), we obtain 


ê = 0.487 + 0.047ê? , + 0.30922 , + 0.004ê2 , + 0.1044? , + 0.234€? , 
(5.63) (2.78) (18.22) (0.20) (6.14) (13.73) 


The sample F-statistic for the null hypothesis that all coefficients on the lagged val- 
ues of { ê) are equal to zero is 209.98; with 5 degrees of freedom in the numerator and 
3275 in the denominator (we restrict five coefficients and lose five usable observations), 
the prob-value is 0.000. Hence, we can conclude that there are GARCH errors. 

Now take a little quiz. How accurately does the lag length need to be estimated to 
perform the test? The obvious answer (“As accurately as possible!”) begs the question. 
Clearly, you do not want to include lags that have very small t-values; including lags 
that are insignificant will reduce the power of the test. If your lag length is too short, 
you could fail to detect the presence of conditional volatility. However, if your lag 
length is shorter than the true structure, and if you still detect GARCH effects, you can 
conclude that GARCH effects are present in the data. To take a simple example, if you 
find GARCH effects using only ê ,> you can conclude there is some type of ARCH 
effect. 
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Alternative Estimates of the Model 


As in the Box—Jenkins method, we want to estimate a parsimonious model. Not only 
can we alter the lag lengths for the GARCH(p, q) process, but we can also allow for 
ARCH-M effects and for specifications with asymmetry. Given the tremendous number 
of possible specifications, it is very easy to overfit the data. Consequently, it is best to 
start with a simple model and determine whether or not it is adequate. If it fails any 
of the diagnostic checks, it is possible to use a more complicated model. We begin 
by estimating (3.41) using a GARCH(1, 1) error process. The results from maximum 
likelihood estimation using the normal distribution are as follows: 


r, = 0.043 + €, — 0.0587,_, — 0.0387,5 AIC = 9295.36, SBC = 9331.91 


(2.82) (-3.00) (1.91) 
h, = 0.014 + 0.084e2_, + 0.906h,_, 
(4.91) (9.59) (98.31) 


Instead, if we use a f-distribution, we obtain 


r, = 0.061 +£, — 0.062r,_, — 0.0457, AIC = 9162.72, SBC = 9205.37 


(5.24) (-3.77)  (-2.64) 
h, = 0.009 + 0.089£2 , + 0.909h,_ 
(3.21) (8.58) (95.24) 


where the estimated number of degrees of freedom parameter for the f-distribution is 
6.14 with a standard error of 0.67. As such, we can conclude that the degrees of freedom 
parameter is far from that needed to approximate a normal distribution. Although the 
two parameter sets are very close, Figure 3.13 also suggests that we proceed using 
the ¢-distribution. Moreover, if you calculate the sum of squares of the standardized 
residuals as in (3.30), you should find SSR’ = 3269.42 from the first model and that 
SSR’ = 3225.53 from the model using the f-distribution. Note that in computing the 
AIC and SBC using the f-distribution, you want to be sure to count the degrees of 
freedom parameter as a regressor. 

Since the sum of the coefficients on e and h,_, is almost identically equal to 
unity, we can estimate the IGARCH(1, 1) model using the f-distribution: 


r, = 0.061 + £, — 0.062r,_, — 0.045r,_ AIC = 9160.88, SBC = 9197.43 
(4.41) (=3.25)  (—2.60) 


h, = 0.008 + 0.0902? , + 0.910h,_, 
(5.69) (13.00) (130.72) 


Note that there are offsetting tendencies regarding the fit of the GARCH specifi- 
cation versus that of the IGARCH specification. On one hand, the fit of the IGARCH 
model will not be as good as that of the GARCH model since the IGARCH model 
imposes a constraint on the sum of the coefficients. However, the IGARCH model 
is more parsimonious than the GARCH(1, 1) model since there is one fewer coef- 
ficient to actually estimate (i.e., 6, = 1 — a,). Note that the AIC and the BIC select 
the IGARCH specification. If you experiment by introducing the second lag ae you 
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should find that it is not significant. If you try to include additional values of p into an 
IGARCH(p, 1) model, you will find that the coefficient of h,_, is negative. Moreover, 
the ARCH-M specification is not favorable to the presumption that the return on the 
return of the NYSE 100 contains a time-varying risk premium. For example, if we use 
the IGARCH(1, 1) specification for h,, we find that the model for the mean is 


r, = 0.049 +, — 0.062e,_, — 0.0452, + 0.016h, 
(2.69) (-3.70)  (=2.58) (1.02) 


Diagnostic Checking 


Now we need to know whether the IGARCH(1, 1) model passes the various diagnostic 
checks for model adequacy. As all diagnostic tests are performed on the standardized 
residuals, begin by forming the series s, = ê,/ Aes. 

Remaining serial correlation: The autocorrelations of the {s,} series are all very 
small; the Ljung—Box Q(5), Q(10), and Q(15) statistics are 3.69, 8.30, and 15.12, 
respectively. None of these values are significant at conventional levels; hence, we 
conclude that the standardized residuals are serially uncorrelated. 

Remaining GARCH effects: It appears that the IGARCH(, 1) is sufficient to capture 
almost all of the GARCH effects. The ACF of the squared standardized residuals is 
such that 


Pı P2 P3 P4 P5 P6 Py Ps Po Pio 
—0.05 0.03 -0.01 0.02 -0.01 -0.01 0.03 -0.01 0.02 0.04 


Now use the standardized squared residuals s to estimate a regression of the form 
E 2 2 
Sp = do + aS) Hte +a, Sip 


If you use various values of n, you will find that none of the a, through a, are sta- 
tistically significant. However, if you use n = 2, you cannot reject the null hypothesis 
a, = a = Q. Consider 


s? = 0.99 — 0.05s?_, + 0.0357, 
(23.55) (-2.58) (1.79) 


The restriction a, = a, = 0 has an F-value of 5.15 and a prob-value of 0.006. 
Hence, you can reasonably conclude that there are small but statistically significant 
remaining GARCH effects. At this point, it is a judgment call as to whether to flush-out 
the remaining conditional volatility. With such a large number of observations, even 
very small coefficients can readily be found to be statistically significant. As already 
discussed, the estimates of higher order GARCH processes are unsatisfactory. 
Trying to estimate a pure ARCH process also yields poor results. For example, if you 
estimate the conditional variance as an ARCH(12) process, you will find that each 
one of the ARCH coefficients will be positive and statistically significant. Although 
the 12 lags successfully remove any remaining GARCH effects, the model is clearly 
overparameterized. As such, we retain the IGARCH(1, 1) specification. 
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Leverage effects: If there are no leverage effects, s should be uncorrelated with the 
lagged levels of {s,}. However, if considering the regression equation 


s? = 0.960 — 0.095s,_,; — 0.1785,_5 
(28.24) (—2.76)  (—5.18) 


The coefficients on s,_, and s,_, are highly significant and the F-statistic for the 
null hypothesis that the coefficients on the two lagged values are jointly equal to zero 
is 17.33 with a prob-value of 0.000. Given that the signs are negative, we conclude that 
negative shocks are associated with large values of the conditional variance (i.e., when 
5,1 and s,_5 are negative, the expected value of s? is large). This result is reinforced by 
the Engle—Ng sign test. Set d,_, = 1 if s,_, < 0; otherwise, set d,_; = 0. Now if you 
perform the sign bias test, you will find 


s2 = 0.658 + 0.293d,_; + 0.140d,_) + 0.201d,_3 
(9.63) (4.32) (2.07) (2.96) 


The coefficients on d,_,, d,—2, and d,_3 are all significant and the F-statistic for 
the null hypothesis that the three coefficients are jointly equal to zero is 10.54 with a 
prob-value of 0.000. As such, negative values of s,_,; are associated with large values 
of s: If you use the general form of the test, you will find 


s2 = 0.940 + 0.268d,_, + 0.104d,_,5,_, — 0.13001 = d,_,)s,_1 
(14.48) (2.96) (2.20) (-2.41) 


The implication is that there is a leverage effect such that positive shocks are asso- 
ciated with a diminished variance. Since all terms including the expression d,_, enter 
with positive coefficients, the size of the leverage effect depends on the magnitude of 
the shock (not just the direction). 


The Asymmetric Models 


2 


A TARCH model is unsatisfactory since the estimated coefficient on € | 


The estimated equation for the conditional variance is 


h, = 0.010 — 0.022? | + 0.154d,_,€,, + 0.933h,_, 
(4.70) (-2.92) (9.51) (114.46) 


is negative. 


It is not possible to reestimate the model without the variable E , Recall the argu- 
ment demonstrating that a, must be present in the model for a GARCH(1, 1) model 
to be identified. You should be able to show that the identical reasoning applies to 
the TARCH model. One possibility is to constrain the coefficients to be positive. An 
alternative is to estimate an EGARCH model using a f-distribution. Consider 


r, = 0.038 — 0.060r,_; — 0.032r,_5 AIC = 9055.66, SBC = 9104.39 
(2.88) (-3.59) (1.94) 


In(h,) = —0.087 + 0.108|e,_,/h2,| — (0.129)e, 1/9 + 0.986 In(h,_,) 
(—57.72) (30.50) (—12.12) (387.10) 


where the degrees of freedom parameter is 6.88. 
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All of the coefficients in the equation for In(/,) are highly significant. Notice that 
the form of the asymmetry is somewhat different from that shown in Figure 3.11. If 
you let h,_; = 1, a one unit decline in €,_, will increase the log of conditional volatil- 
ity by 0.237 units (0.108 + 0.129 = 0.237). However, a one-unit increase in €,_, is 
estimated to induce a decline in the log of the conditional variance by —0.021 units 
(0.108 — 0.129 = —0.021). The implication is that “bad news” has a large effect on the 
conditional volatility but “good news” actually induces a small decrease in volatility. 
It is interesting to note that a y?-test for the restriction that the two coefficients sum to 
zero yields a value of 3.54; with 1 degree of freedom, the prob-value is 0.06. 

The two most plausible models seem to be the IGARCH and EGARCH models. 
The EGARCH model captures a leverage effect such that good news shocks (i.e., pos- 
itive shocks) actually decrease volatility, whereas bad news shocks have a large and 
positive effect on volatility. The AIC and SBC select the EGARCH model over the 
GARCH(1, 1) model. For the EGARCH model, the AIC is 9055.66 and the SBC is 
9104.39. Recall that those for the GARCH(1, 1) are such that AIC = 9160.88, SBC = 
9197.43. The major downside of the EGARCH model is that the asymmetry makes it 
very difficult to use for forecasting. As a final check on the adequacy of the EGARCH 
model, the following diagnostic checks were performed: 


1. Checks of the standardized residuals: The standardized residuals were 
checked to determine whether they exhibited serial correlation. Similarly, 
the squares of the standardized residuals were checked for serial correlation. 
Any correlation in the {s2} series implies that there are neglected GARCH 
effects in the residuals. It does turn out that there is a small, albeit significant 
amount of serial correlation at the first lag. If we use the LM test for remain- 
ing GARCH errors, we find 


s; = 1.04 — 0.0545? | 
(27.95) (—3.07) 


2. Q-plots to determine the distribution of the errors: In order to determine 
whether the standardized errors are normally distributed, a standard proce- 
dure is to plot the quantiles of the {s,} sequence against the quantiles of the 
normal distribution. After all if {s,} has a standardized normal distribution, 
0.5% should be below —2.54, 2.5% of the values should be below —1.96, 
50% should be negative, 95% should be above 1.64, and 99.5% should 
be above 2.54. The point is that if {s,} is truly normally distributed, the 
quantiles should lie along a straight line when plotted against the quantiles 
of the normal distribution. Since the example under consideration uses a 
t-distribution, the quantiles of the s, series can be plotted against the quantiles 
of the r-distribution. As you can see in Figure 3.14, except for one extreme 
observation (not shown in the figure), the standardized residuals do appear to 
have a f-distribution. 


Figure 3.15 shows the fitted values of h, for the period from January 4, 2000, 
through July 16, 2012. It should be clear that there is a volatile period beginning in 
mid-2002, a relatively tranquil period from mid-2003 to early 2007, and huge increases 
in h, at the time of the financial crisis and toward the middle of 2011. 
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Quantiles of the t-distribution 


Quantiles of standardized residuals 


FIGURE 3.14 Standardized Errors and Fractiles of the Distribution 
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FIGURE 3.15 The Estimated Variance 


11. MULTIVARIATE GARCH 


If you have a data set with several variables, it often makes sense to estimate the con- 
ditional volatilities of the variables simultaneously. Multivariate GARCH models take 
advantage of the fact that the contemporaneous shocks to variables can be correlated 
with each other. Moreover, multivariate GARCH models allow for volatility spillovers 
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in that volatility shocks to one variable might affect the volatility of other related vari- 
ables. For example, instead of simply modeling the NYSE’s U.S. 100, suppose that 
we also wanted to model the NYSE Composite Index. Although we could separately 
model the variance of each index, we might expect the volatilities of two interrelated. 
After all, shocks that increase the uncertainty of one index are likely to increase the 
uncertainty of the other. (If you are comfortable with matrix algebra, you may want to 
look at the first part of Appendix 3.1 in the Supplementary Manual before proceeding.) 

To keep the analysis as simple as possible, suppose there are just two variables, y4; 
and y>,. For now, we are not interested in the means of the series so we can consider 
only the two error processes 


2 0.5 

Eir = Vihe) 
= 0.5 

En = Vul) 


As in the univariate case, if we assume var(v,,) = var(v>,) = 1, we can think of 
hıı; and hy», as the conditional variances of €,, and €5,, respectively. Since we want to 
allow for the possibility that the shocks are correlated, denote h,,, as the conditional 
covariance between the two shocks. Specifically, let hy, = E,_1€1,€)- 

As detailed in Appendix 3.1 in the Supplementary Manual, a natural way to con- 
struct a multivariate GARCH(1, 1) process is to allow all of the volatility terms to 
interact with each other. Consider the so-called vech model 


_ 2 2 
hiis = Cio + Ef) + M1 2E y-1Ey-1 + M35, 


+ By Ayia + Piah + Pizh- (3.42) 
hiz = C% + aE] + Oy9€ 1p Eo, t 09385, 

+ Bo Mya + Bo2ħiz1 + Poshor-1 (3.43) 
haz; = C39 + sE] + @32E€11-1E2-1 t 013365, 

+ Psihi + B3221 + P33hoy-1 (3.44) 


Here, the conditional variance of each variable (h,,, and h,,,) depends on its own past, 
the past of the other variable, the conditional covariance between the two variables 
(hiz), the lagged squared errors ig and EZ) and the product of lagged errors 
(€1-1E2-1). Notice that the conditional covariance depends on the same set of vari- 
ables. Clearly, there is a rich interaction between the variables. For example, after one 
period, a v,, shock affects h,,,, A121 and hyp,. 

Although simple to conceptualize, multivariate GARCH models in the form of 
(3.42)—(3.44) can be very difficult to estimate. Some of the details concerning maxi- 
mum likelihood estimation are contained in Appendix 3.1. Nevertheless, it should be 
clear that: 


m The number of parameters necessary to estimate can get quite large. In the 
two-variable case above, there are 21 parameters. The number grows very 
quickly as more variables are added to the system and as the order of the 
GARCH process increases. If you understand the nature of the multivariate 
model above, you should be able to show that a GARCH(2, 1) model necessi- 
tates the estimation of nine additional parameters. You can also verify that, in 


MULTIVARIATE GARCH 167 


the three-variable case, a GARCH(1, 1) model contains six equations (since 
there are equations for h11,, azr, 33;, Ay2;, A131, and h>3,) and that each of the 
equations entails the estimation of 12 coefficients plus a constant. 
Moreover, we have not begun to specify the models of the mean. If we have 
two variables y,, and y,,, it is possible to estimate the means by specifying 
Yir — M1 = E1 and yo, — Hy = Ez. Once lagged values of {y,,} and {y2,} 
and/or explanatory variables are added to the mean equation, it should be clear 
that the estimation problem can be quite complicated. 

m= As in the univariate case, there is not an analytic solution to the maximiza- 
tion problem. As such, it is necessary to use numerical methods to find the 
parameter values that maximize the function L. Unfortunately, such methods 
may not be able to find a maximum value if the model is overparameterized. 
To explain, if a coefficient is small relative to its standard error, it necessar- 
ily has a large confidence interval. As such, there is a large range in which 
the coefficient may lie and slight changes in the coefficient’s value will have 
little influence on the value of L. The numerical “hill climbing” techniques 
that computers use in their maximization routines will have difficulty pinning 
down the value of such a coefficient. Hence, when attempting to estimate an 
overparameterized model, it is typical for a software package to indicate that 
its search algorithm did not converge. 


m Since conditional variances are necessarily positive, the restrictions for the 
multivariate case are far more complicated than for the univariate case. The 
results of the maximization problem must be such that each one of the condi- 
tional variances is always positive and that the implied correlation coefficients, 
pij = h;;/(h;h;)?>, are between —1 and +1. 


In order to circumvent these problems, much of the recent work involving multi- 
variate GARCH modeling involves finding suitable restrictions on the general model of 
(3.42)—(3.44). One set of restrictions that became popular in the early literature is the 
so-called diagonal vech model. The idea is to diagonalize the system such that h,;, con- 
tains only lags of itself and the cross products of €;,€;,. For example, the diagonalized 
version of (3.42)—(3.44), called the diagonal vech, is given by 


= 2 
My = Cio + 1&1 + Piihi 
hiz = C20 + Q22E£11-1 E21 + Pozħiz-1 
2 
Ngo, = C30 + 033€5,_, + B33M22)-1 
Given the large number of restrictions, the model is relatively easy to estimate. 
Each conditional variance is equivalent to that of a univariate GARCH process and the 
conditional covariance is quite parsimonious as well. The problem is that setting all 
a,; = Pi = 0 (for i # j) means that there are no interactions among the variances. A £11 
shock, for example, affects h44; and h43; but does not affect the conditional variance hy,. 
Notice that the system-wide estimation does have the advantage of controlling for the 
contemporaneous correlation of the residuals across equations. 


Baba, Engle, Kraft, and Kroner (1991) and Kroner (1995) popularized what is now 
called the BEKK model that ensures that the conditional variances are positive. The 
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idea is to force all of the parameters to enter the model via quadratic forms ensuring 
that all the variances are positive. Although there are several different variants of the 
model, consider the specification 


H,=C'C+A’e,_€,_)'A+ BH,_,B 


where for the two-variable case considered in (3.42)—(3.44), 


g= fa a C= i a A= is male Ro f (3.45) 


hiz hz Ci? C2 9) %22 Bo 
For example, if you perform the indicated matrix multiplications, you will find 


EER 2 2,2 2 e? 
hiy = (cii + Cha) + OG + 2011021E1-1E2-1 + 05, €5,_1) 


+ (B? My yt + 2211 b211 + BS, 221-1) 


In general, h; will depend on the squared residuals, cross-products of the residuals, 
and the conditional variances and covariances of all variables in the system. As such, 
it is the model that allows for shocks to the variance of one of the variables to “spill 
over” to the others. The problem is that the BEKK formulation can be quite difficult to 
estimate. The model has a large number of parameters that are not globally identified. 
Changing the signs of all elements of A, B, or C will have no effect on the value of the 
likelihood function. As such, convergence can be quite difficult to achieve. 

Another popular multivariate GARCH specification is constant conditional cor- 
relation model. As the name suggests, the constant correlation coefficient (CCC) model 
restricts the correlation coefficients to be constant. As such, for each i 4 j, the CCC 
model assumes hj, = pi Mihi)? - In a sense, the CCC model is a compromise in that 
the variance terms need not be diagonalized, but the covariance terms are always pro- 
portional to (h;t aa For example, a CCC model could consist of (3.42), (3.44), and 


tj 
0.5 
hiz = Pihi ihz) 


Hence, the covariance equation entails only one parameter instead of the seven 
parameters appearing in (3.43). 

Bollerslev (1990) illustrates the usefulness of the CCC specification by examin- 
ing the weekly values of the nominal exchange rates for five different countries—the 
German mark (DM), the French franc (FF), the Italian lira (IL), the Swiss franc (SF), 
and the British pound (BP)—relative to the U.S. dollar. Note that a five-equation sys- 
tem would be too unwieldy to estimate in an unrestricted form. For the model of the 
mean, the log of each exchange rate series was modeled as a random walk plus a drift. 
If y; is the percentage change in the nominal exchange rate for country i, the model of 
the mean for each country is simply 


Yit = Hi + Ei (3.46) 


Ljung—Box tests indicated that each series of residuals did not contain any serial 
correlation. This is consistent with the fact that nominal exchange rates behave as 
random-walk processes when using high-frequency data. As a next step, Bollerslev 
(1990) tested the squared residuals for serial dependence. He reports that the autocorre- 
lations of the squared residuals are strongly indicative of GARCH effects. For example, 
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for the British pound, the Q(20)-statistic has a value of 113.020; this is significant at 
any conventional level. Given the presence of conditional heteroskedasticity, Boller- 
slev next turned to finding the appropriate orders for the five GARCH(p, q) processes. 
Individually, each residual series could be well estimated as a GARCH(1, 1) process. 
As such, the specification for the full model has the form of (3.46) plus 

hii = Cio + ApEn + Pihi Geles) 

h= pij hihi)” C#J) 

Notice that the full model requires that only 30 parameters be estimated (5 values 
of 4;, the 5 equations for h;;, each have 3 parameters, and 10 values of the p,;). He reports 
that, with 333 observations, the required number of matrix inversions is reduced from 
10,323 to 31. Also notice that the CCC model has an important advantage over the sep- 
arate estimation of each equation. As in a seemingly unrelated regression framework, 
the system-wide estimation provided by the CCC model captures the contemporaneous 
correlation between the various error terms. As such, the coefficient estimates of the 
GARCH process are more efficient than those from a set of single equation estimations. 
The estimated correlations for the period during which the European Monetary System 
(EMS) prevailed are 


DM FF IL SF 


FF 0.932 

IL 0.886 0.876 

SW 0.917 0.866 0.816 

BP 0.674 0.678 0.622 0.635 


It is interesting that correlations among continental European currencies were all 
far greater than those for the pound. Moreover, the correlations were much greater 
than those of the pre-EMS period. Clearly, EMS acted to keep the exchange rates of 
Germany, France, Italy, and Switzerland tightly in line prior to the introduction of 
the euro. 

If you are familiar with matrix algebra, the last part of Appendix 3.1 shows you 
how to generalize Bollerslev’s method so as to estimate time-varying (or dynamic) 
conditional correlations. 


Updating the Study 


The file labeled EXRATES(DAILY).XLS contains the 3475 daily values of the euro, 
British pound, and Swiss franc over the January 3, 2000—April 26, 2013 period. The 
time paths of the three series are shown in Figure 3.5. Denote the U.S. dollar value 
of each of these nominal exchange rates as e, where i = EU, BP, and SW. If you 
plot the three currencies, you will see that all three tend to move together through 
mid-2008. However, the comovements seem to weaken in the later part of the sam- 
ple. As a preliminary step, construct the logarithmic change of each nominal exchange 
rate as y; = log(e;,/e;,_;). As in any GARCH estimation, the first step is to estimate 
the model of the mean. If you follow Bollerslev (1990) and estimate equations in the 
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form of (3.46), you should obtain the means as 


euro BP SW 
7.16x 1075 —1.01x 1075 1.49 x 10-4 
(0.66) (0.14) (1.26) 


The residual autocorrelations are all very small in magnitude, and none is signifi- 
cant using the Ljung—Box Q(4) or Q(8) test. For example, the autocorrelations for the 
euro are 


Pı P2 P3 P4 Ps Po 
0.012 —0.026 0.003 0.022 0.006 —0.014 


With T = 3474, there is no reason to incorporate any lagged changes in the mean 
equation. 

For the second step, you should check the squared residuals for the presence of 
GARCH errors. Since we are using daily data (with a 5-day week), it seems reasonable 
to begin using a model of the form 


5 
a2 _ a2 
E; = Qo t+ >, QjE s 
i=l 


The sample values of the F-statistics for the null hypothesis that a, =--- =a; = 0 
are 24.72, 65.45, and 5.80 for the euro, BP, and SW, respectively. Since all of these 
values are highly significant, it is possible to conclude that all three series exhibit 
GARCH errors. 

For the third step, you should try to find the proper form of the GARCH model for 
each exchange rate series. Although some other GARCH forms (such as the IGARCH 
model) might seem more appropriate than Bollerslev’s specification, proceed as if the 
GARCH(1, 1) model is appropriate for each series. If you estimate the three series as 
a multivariate GARCH(1, 1) process using the CCC restriction, you should find the 
results reported in Table 3.1. 

If we let the numbers 1, 2, and 3 represent the euro, pound, and franc, the corre- 
lations are p; = 0.68, p;3 = 0.87, and p23 = 0.60. As in Bollerslev’s paper, the pound 
and the franc continue to have the lowest correlation coefficients. 


Table 3.1 The CCC Model of Exchange Rates 


c a bı 

Euro 1.32 x 1077 0.047 0.951 
(2.44) (10.79) (240.91) 

Pound 2.42 x 1077 0.040 0.953 
(3.28) (7.71) (149.15) 

Franc 2.16 x 1077 0.059 0.940 
(2.57) (12/82) (215.36) 


Note: t-statistics in parentheses. 
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By way of contrast, it is instructive to estimate the model using a standard diagonal 
vech specification such that 
hijg = Cig + @ijEit—1Ej-1 + By hye 

The estimation results are given in Table 3.2. Now, the correlation coefficients are 
time varying. For example, the correlation coefficient between the pound and the franc 
is given by h53,/(h>,433,)°°. The time path of this correlation coefficient is shown in 
Figure 3.16. Although the correlation does seem to fluctuate around 0.60 (the value 
found by the CCC method), there are substantial departures from this average value. 
Beginning in mid-2006, the correlation between the pound and the franc began a long 
and steady decline ending in early 2008. The correlation increased with fears of a U.S. 
recession and then sharply fell with the onset on the U.S. financial crisis in the Fall of 
2008. Notice that the correlation actually becomes negative for a short period in early 
2009. Thereafter, the correlation rises to nearly 0.80. 


Table 3.2 Estimates Using the Diagonal vech Model 


Aut hizt hizt got hat hzzt 
c 4.01x 107 2.50x107 445x10-? 2.62x107 2.32x107 5.88x107 
(18.47) (6.39) (33.82) (4.31) (6.39) (10.79) 
a, 0.047 0.035 0.047 0.037 0.033 0.050 
(14.51) (11.89) (14.97) (9.59) (12.01) (14.07) 
bı 0.946 0.956 0.945 0.956 0.959 0.941 
(319.44) (268.97) (339.91) (205.04) (309.29) (270.55) 


Note: t-statistics in parentheses. 
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FIGURE 3.16 Pound/Franc Correlation from the Diagonal vech 
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12. VOLATILITY IMPULSE RESPONSES 


As we learned from the Great Recession, a shock to one market can readily spill over 
into other markets. Although much of the financial crisis originated in the housing mar- 
ket, a great deal of uncertainty was created in interest rate sensitive sectors such as auto- 
mobile production. Such volatility spillovers can readily be captured by a multivariate 
GARCH model in that it allows an £; shock to affect the variances and covariances of 
every variable. For example, if you update (3.42)—(3.44) by one period, it should be 
clear that an €,, shock will affect hy 1,41, 242,41, and /5,,,. Yet, the story does not end 
in period ¢ + 1 because of volatility persistence. Obviously, if the values of the hj, | 
are large, we would anticipate that the values of the /;;,, will be large as well. 

Although the mathematical specification of the GARCH model does allow for a 
rich set of interactions among the volatilities, the large number of parameters estimated 
means that it is near impossible to interpret the magnitudes of the spillovers from an 
examination of the estimated coefficients alone. Hafner and Herwartz (2006) show how 
to construct a volatility impulse response function so as to plot out the consequences 
of volatility shocks on the entire system. 

In the two-variable case given by (3.42)—(3.44), the volatility forecasts for period 
t+ 1 are straightforward. For example, from (3.42), the one-step-ahead forecast for 
Riia ÍS 


2 2 
E;ħii1i = Cio + CE ty + U2 Ear + 3&3, + Piihi + Bihin + Pish 
Now if you update the equation for h,,, by two periods and take the conditional 


expectation, you should obtain 


2 
1t+1 


+ PiE + Bi Ehr 


Z 2 
E,ħii2 = Cio + ap EE g + OEE E1 + 13 E E54) + BE 


As in (3.22), EE? > = E,ħhiit2» and since E,€,49€ p49 = Ehita it follows that 
Ehia = Cio + (Oy + Bi VDE ie + 12 + BidE Mya + (013 + Bis) E hoor 


In principle, it is possible to solve the entire system recursively so as to obtain the 
variance and covariance forecasts for every variable in the model. Fortunately, this is 
unnecessary since most professional statistical software packages can make the appro- 
priate calculations to obtain the entire set of volatility forecasts. Of course, the forecasts 
will change if any of the values of the €;, (or h,;,) are allowed to change. The differences 
in the volatility forecasts for any two sets of the initial values comprise the volatility 
impulse functions. 

Formally, Hafner and Herwartz (2006) define the volatility impulse response func- 
tion for hı; as follows. Let the information set at T consist of all values of £;, and hj, 
fort = 1,2, ..., T and form the conditional volatility forecasts Eph, )74;(i = 1,2, ...). 
Now suppose that we disturb one or more of the £;p by some given amount and again 
obtain the volatility forecasts. Call these forecasts E7«h,,7,;. The differences between 
the two sets of forecasts (i.e., Ersħhiir+; — Eph},74;) comprise the variance impulse 
response function. In essence, the differences between the forecasts measure the 
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influences of the shocks. Oftentimes, an external shock will simultaneously affect €)7 
and £r. As shown in the example below, it is possible to plot the volatility effects of 
this external shock. 


An Example 


As you can see in Figure 3.5, during the last part of October and the early part of 
November 2008, there were several large exchange rate shocks resulting from the finan- 
cial crisis. One way to estimate the influence of these shocks on the exchange rate 
volatilities is to reestimate the series in the file EXRATES(DAILY).XLS and then cre- 
ate a variance impulse function. For our purposes here, it is desirable to reestimate 
the model since the constant conditional correlation restriction does not allow for any 
interesting volatility spillovers. To keep the analysis simple, first estimate the euro and 
pound exchange rates as a multivariate GARCH(1, 1) process using the BEKK speci- 
fication. In terms of (3.45), you should find 


aa [0-132 -0.031], p [0.993 0.008 
= 10.028 0.214 } ? 7 |0.010 0.971 


where the intercepts are cjo = 0.000360, co) = 0.000403, and c39 = 0.000275. If you 
form hy, as in (3.45), you should find 


hy, = 0.000360 + 0.0174e?_, + 0.00739€ ,,_1€5,-1 + 0.00078e3,_, 
+ 0.986; 1,1 + 0.020h)5,, + 0.0001hy5,_; 


One issue is how to select the shocks to use for the comparison. A seemingly nat- 
ural way to measure the influence of the shocks is to simply shock one of the variables 
(holding the other shock at zero). For example, you could try to shock one of the vari- 
ables by, say one standard deviation, and measure how it affects the volatility forecasts 
of all variables. However (as discussed in detail in Chapter 5), shocks are usually corre- 
lated across equations, a typical shock to one sector involves contemporaneous changes 
in the other sectors. As such, you do not want to simply shock one variable and hold 
all other £; at zero. Moreover, not only are zero values atypical, zero values of £; rep- 
resent the lowest volatility state possible. Conditioning on a zero value means that the 
volatility impulse responses will necessarily rise over time. 

One way to circumvent the problem is to pick a set of shocks equal to the actual 
residuals at some particular date T* in the data set. Obtain the volatility forecasts using 
these residuals (i.e., the values of €,7« and £z») as the initial shocks. A comparison of 
these volatility forecasts to the volatility forecasts using the actual values of €,, and 
€ 7 comprise the volatility impulse response function. If you pick a date at which there 
is a large external shock, you can trace the effects of this event on the volatilities. 

Although there is no one specific date at which the financial crisis occurred, we 
can let there be a shock to both €, and £), equal in magnitude to actual values of the 
shocks on October 29, 2008. The issue is: How would knowledge of the values of these 
shocks lead us to revise our volatility forecasts? The results are shown on the three 
panels of Figure 3.17. The responses have been standardized in that each is divided by 
the actual volatility on October 29, 2008. As you can see from Panels (a) and (c), the 
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Panel (a) Volatility response of the euro 
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jä Panel (c) Volatility response of the pound 
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FIGURE 3.17 Variance Impulse Responses from October 29, 2008 


financial crisis shocks induced increases in both the forecasted volatilities of the euro 
and the British pound. The volatility of the euro increases by about 25% and that for the 
pound by about 38%. These volatility increases were quite persistent in that the euro’s 
volatility increase is estimated to last until July 2009 and that for the pound is estimated 
to last even longer. Given that both currencies appreciated on October 29, 2008, it is 
not surprising that the forecasted covariance between the two variables is estimated to 
be higher than otherwise. 


13. SUMMARY AND CONCLUSIONS 


Many economic time series exhibit periods of volatility. Conditionally heteroskedastic 
models (ARCH or GARCH) allow the conditional variance of a series to depend on the 
past realizations of the error process. A large realization of the current period’s distur- 
bance increases the conditional variance in subsequent periods. For a stable process, 
the conditional variance will eventually decay to the long-run (unconditional) variance. 
As such, ARCH and GARCH models can capture periods of turbulence and tranquility. 

Conditional variance is often used as a measure of risk. ARCH and GARCH effects 
have been included in a regression framework to test hypotheses of risk-averse agents. 
For example, if producers are risk averse, conditional price variability will affect prod- 
uct supply. Producers may reduce their exposure by withdrawing from the market in 
periods of substantial risk. Similarly, asset prices should be negatively related to their 
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conditional volatility. Such ARCH effects in the mean of a series (ARCH-M) are a 
natural implication of asset-pricing models. 

The basic ARCH and GARCH models have been extended in a number of inter- 
esting ways. The IGARCH model allows volatility shocks to be permanent and the 
TARCH and EGARCH models allow negative shocks to behave differently than posi- 
tive shocks. You can also include explanatory variables in the equation for the condi- 
tional volatility. 

One interesting development is the application of GARCH models in a multivariate 
setting. The problem is that an unrestricted multivariate GARCH has too many param- 
eters to reasonably estimate. Nevertheless, most software packages now incorporate 
a number of specifications that restrict the parameters of the multivariate model. For 
example, EVIEWS and RATS are able to use the method of Engle and Kroner (1995) 
and Bollerslev’s (1990) constant conditional correlations. 

Estimating any type of GARCH model can be difficult. Here are some suggestions 
to improve your estimates. 


1. Be sure that your model of the mean is appropriate. Any misspecification in 
the mean equation will carry over into the variance equation. Clearly, the esti- 
mated {£,} series must be serially uncorrelated in order to obtain a sensible 
model of the conditional variance. 


2. Itis very easy to “overfit” the data; you could wind up with a very compli- 
cated model when a far more parsimonious model actually captures the nature 
of the data-generating process. Pretest the squared residuals for the presence 
of ARCH errors. Similarly, do not simply include leverage effects, ARCH-M 
effects, or large values of p and/or g without good reason. 


3. Itis very common to find that the sum of the a; and the J; is very close to 
unity. Such highly persistent volatilities do seem to be the case for financial 
data. However, Hillebrand (2005) showed that a neglected structural break in 
the variance series can create the appearance of a highly persistent conditional 
volatility. After all, if the conditional variance is always small before some 
date ¢* and then is always large, the conditional volatility is definitely persis- 
tent. However, in such a circumstance, the volatility would be best captured 
by a dummy variable indicating the break date. Plot the conditional volatility 
to ensure that there are several periods with high and low volatilities. 

Moreover, as shown by Ma, Nelson, and Startz (2007), the estimated 
sum of the GARCH coefficients can also be close to unity when the true 
GARCH effect is very small or absent. To explain, suppose you estimate a 
GARCH(1, 1) model and find that a, + f; ~ 1. As such, the current level 
of conditional volatility is expected to prevail into the future. However, this 
could happen because the actual data-generating process is a near-IGARCH 
process or because the amount of conditional volatility is always constant 
(so that h, = h,_ + + = a). As such, it is important to examine the ACF of 
the squared residuals and pretest for conditional heteroskedasticity. You can 
also compare the GARCH model to a low-order ARCH(q) process check if 
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the persistence is actually large. You do not want to estimate a near-IGARCH 
process when the amount of conditional volatility is actually quite small. 

4. Multivariate GARCH models can be quite difficult to estimate. There are 
a number of different specifications that ease the estimation problems. If 
the diagonal vech model does not provide sufficient interaction among the 
conditional variances and covariances, try the BEKK specification. The CCC 
model, or the DCC model described in the Supplementary Manual (See 
Section 3.1: Appendix 1 to Chapter 3), can be especially helpful in a large 
system. 


QUESTIONS AND EXERCISES 
1. Suppose that the {€,} sequence is the ARCH(q) process 


2 
t-1 


1/2 


E = v(a tae, tH age; ,) 


Show that the conditional expectation of E,_,€? has the same form as the conditional expec- 


t-1t 
tation of (3.1). 
2. Consider the ARCH-M model represented by equations (3.23)—(3.25). Recall that {€,} is a 
white-noise disturbance; for simplicity, let Ee? = Ee? |, = --- = 1. 
a. Find the unconditional mean Ey,. How does a change in 6 affect the mean? Using the 
example of Section 6, show that changing p and ô from (—4, 4) to (—1, 1) preserves the 
mean of the {y,} sequence. 


b. Show that the unconditional variance of y, when h, = a + @,€7 


,€7_, does not depend on 2, 


6, OF a. 
3. Bollerslev (1986) proved that the ACF of the squared residuals resulting from the 
GARCH(p, q) process represented by (3.9) acts as an ARMA(m, p) process where 
m = max(p, q). You are to illustrate this result using the examples below. 


a. Consider the GARCH(1, 2) process: h, = a + a,€7_, + œE? + fih,- Add the expres- 
š 9 ‘ = 
sion (£7 — h,) to each side so that 
2_ 2 2 2 
E; =A tae, + ae + Pih +(e —h,) 


=a) + (a, + Be, + Ge, — fı (GE — hı) + (E =h, 
Define y, = (£? — h,), so that 
an 2 2 
E; = Ay) + (a, + BE_) + AE; — Pini +M, 


Show that 


i. n, is serially uncorrelated. 
ii. The {€?} sequence acts as an ARMA(2, 1) process. 


b. Consider the GARCH(2, 1) process h, = a) + ae, + Pih, + Pah,- Show that it is 
possible to add y, to each side so as to obtain 


Ep = Oy + HE, + BA, +n, + Bol» 


Show that adding and subtracting the terms f,n,_, and £,1,_, to the right-hand side of this 
equation yields an ARMA(2, 2) process. 
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c. Provide an intuitive explanation of the statement: “The Lagrange multiplier for ARCH 
errors test cannot be used to test the null of white noise squared residuals against an alter- 
native of a specific GARCH(p, q) process.” 

d. Sketch the proof of the general statement that the ACF of the squared residuals resulting 
from the GARCH(p, q) process represented by (3.9) acts as an ARMA(m, p) process 
where m = max(p, q). 

. Let yọ = 0, and let the first five realizations of the {€,} sequence be (1, —1, —2, 1, 1). Plot 

each of the following sequences: 


Modell: y, = 0.5y,_, + €, 
Model2: y, =€,- € 


t-1 


Model 3: y, = O0.5y,_, + €, — Ey 


a. How does the ARCH-M specification affect the behavior of the {y,} sequence? What is 
the influence of the autoregressive term in model 3? 

b. For each of the three models, calculate the sample mean and variance of {y,}. 

. The file labeled ARCH.XLS contains the 100 realizations of the simulated {y,} sequence 

used to create the lower right-hand panel of Figure 3.7. Recall that this series was simu- 

lated as y, = 0.9y,_, + £, where g, is the ARCH(1) error process £, = v,(1 + 0.8¢,_,)!/7. You 

should find the series has a mean of 0.263, a standard deviation of 4.894, and minimum and 

maximum values of —10.8 and 15.15, respectively. 


a. Estimate the series using OLS and save the residuals. You should obtain 


y, = 0.944y,_, + £, 
(26.51) 


Note that the estimated value of a, differs from the theoretical value of 0.9. This is due to 
nothing more than sampling error; the simulated values of {v,} do not precisely conform 
to the theoretical distribution. However, can you provide an intuitive explanation of why 
positive serial correlation in the {v,} sequence might shift the estimate of a, upward in 
small samples? 

b. Obtain the ACF and the PACF of the residuals. Use Ljung—Box Q-statistics to determine 
whether the residuals approximate white noise. You should find 


1 2 3 4 5 6 7 8 


ACF 0.149 0.004 -0.018 -0.013 0.072 -0.002 -0.110 -—0.152 
PACF 0.149 -0.018 -0.016 —0.008 0.077 -—0.025 -0.109 —0.122 


Q(4) = 2.31, O(8) = 6.39, O(24) = 18.49. 
c. Obtain the ACF and the PACF of the squared residuals. You should find 


1 2 3 4 5 6 7 8 


ACF 0.474 0.128 —0.057 -0.077 0.055 0.245 0.279 0.223 
PACF 0.474 -0.125 —0.087 0.005 0.132 0.205 0.074 0.067 


Based on the ACF and PACF of the residuals and the squared residuals, what can you 
conclude about the presence of ARCH errors? 
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d. Estimate the squared residuals as £? = a, + ayer ,- You should verify a) = 
1.55 (t-statistic = 2.83) and a, = 0.474 (t-statistic = 5.28). 
Show that the Lagrange multiplier test for ARCH(1) errors is TR? = 22.03 with a signif- 
icance level of 0.00000269. 

e. Simultaneously estimate the {y,} sequence and the ARCH(1) error process using maxi- 
mum likelihood estimation. You should find 


y, = 0.886y,, + €, h, = 1.19 + 0.663€? , 
(32.79) (4.02) (2.89) 


. The second series on the file ARCH.XLS contains 100 observations of a simulated 


ARCH-M process. 
a. Estimate the {y,} sequence using the Box—Jenkins methodology. Try to improve on the 


model 
y, = 1.07 +£, + 0.254e,_, — 0.262€,_, 


(22.32) (2.57) (—2.64) 


b. Examine the ACF and the PACF of the residuals from the MA(||3, 6||) model above. Why 
might someone conclude that the residuals appear to be white noise? Now examine the 
ACF and PACF of the squared residuals. You should find 


1 2 3 4 5 6 
ACF 0.498 0.251 0.290 0.163 0.043 0.114 
PACF 0.498 0.004 0.217 —0.088 —0.041 0.101 


Perform the LM test for ARCH errors. 
c. Estimate the {y,} sequence as the ARCH-M process: 


y, = 0.908 + 0.625h, + €, 
(14.05) (1.79) 

h, = 0.108 + 0.5972, 
(5.59) (2.50) 


d. Check ACF and the PACF of the estimated {¢,} sequence. Do they appear to be satisfac- 
tory? Experiment with several other simple formulations of the ARCH-M process. 


. Consider the ARCH(2) process E,_,€? = ay + a€7_, + an€?., 


a. Suppose that the residuals come from the model y, = a) + a, y,_,; + €,. Find the condi- 
tional and unconditional variance of {y,} in terms of the parameters a}, «g, a, and a. 

b. Suppose that {y,} is an ARCH-M process such that the level of y, is positively related to 
its own conditional variance. For simplicity, let y, = a + ayer it QE , + €,. Trace out 
the impulse response function of {y,} to an {€,} shock. You may assume that the system 
has been in long-run equilibrium (€,_, = €,_, = 0) but now €, = 1. Thus, the issue is to 
find the values of y,, y2, Y3, and y, given that £, = €, =---=0. 

c. Use your answer to part b to explain the following result. A student estimated {y,} as an 
MA(2) process and found the residuals to be white noise. A second student estimated 
the same series as the ARCH-M process y, = a + ayer it aE, + £,. Why might both 
estimates appear reasonable? How would you decide which is the better model? 

d. In general, explain why an ARCH-M model might appear to be a moving average 
process. 


The file RGDP.XLS contains the data used to construct Figures 3.1 and 3.2. 


10. 


11. 
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a. If you examine ACF of the residuals formed from the model in Section 4 concerning the 
volatility break in real GDP growth, you should find 


1 2 3 4 5 6 


ACF —0.065 0.117 —0.047 —0.043 —0.120 0.004 


However, if you examine the ACF of the standardized residuals, you should find 


1 2 3 + 5 6 


ACF —0.072 0.182 —0.054 0.033 —0.070 0.025 


The Ljung—Box Q-statistics indicate no significant serial autocorrelation in the residuals 
but significant autocorrelation in the standardized residuals. Explain why the residuals 
may show no serial correlation while the standardized residuals indicate serial correla- 
tion. 

b. Show that a second lag of y, in the mean equation removes the serial correlation from the 
standardized residuals. 

c. Create a dummy variable representing the financial crisis. Specifically, let D2, be a 
dummy variable such that D2, = 0 prior to August 2007 and is | thereafter. If you 
include both D, and D2, in the variance equation, do you conclude that a volatility break 
occurred as a result of the financial crisis? 

d. Use the method presented in Section 4 to show that there were significant volatility 
reductions in real consumption and investment in 1984Q1. Compart the volatility 
persistence of investment to that of consumption. 


. The file NYSE(RETURNS).XLS contains the daily values of the New York Stock 


Exchange Index that was used in Section 10. 

a. Reproduce the results of Section 10. 

b. Compare the results reported in the text to those obtained by assuming normality. 

Use the data of the file EXRATES(DAILY).XLS to estimate a bivariate model of the pound 
and euro exchange rates. In particular, 


a. Does a bivariate diagonal vech model yield very different results from those shown in 
Section 11? 

b. Experiment with the convergence criteria and search methods on your software pack- 
age to determine how they influence the estimates you found in part a. Pay particular 
attention to the standard errors on the coefficients. 

c. Try to get convergence for a pure vech model. Compare the results to those you found in 
part a. 

In answering the following, you should consult Appendix 3.1 of the Supplementary Manual. 


a. Justin finds that a GARCH(2, 1) specification is appropriate for all h,, in a two variable 
diagonal vech model. What is the formula for h,,,? 

b. Jennifer finds that a GARCH(1, 2) specification is appropriate for all /,,, in a two variable 
diagonal vech model. What is the formula for h,,,? 

c. In the two variable BEKK model, it was shown that 


— 2 2 2 2 2 2 
hii = (Ch + Ci) + (A71 Egy F 2G 91 Ey, E21 F E91) 
2 2 
+ (Piihi + 28) bahi- + Biho) 
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Let all of the coefficients be positive. If the €,,_, and €,,_, are of opposite signs, can the 
term a? E? 1 + 2a), @y)€,_1&,_1 + 05,€5,_, be negative? 

Suppose that, in period t, h,,, = 2 andh,,, = 4.5. If the CCC model indicates p,, = —0.5, 
find h 


12t° 


In Section 4, it was established that a reasonable model for the price of oil is an MA(1) with 
the GARCH conditional variance: h, = 0.402 + 0.0977 ı + 0.8814, 


a. 


Estimate the model using a t-distribution. You should find h, = 0.37 + 0.102? rt 
0.88h,_;, where the degrees of freedom parameter is estimated to be 8.77. Why do you 
think that the two estimates are so similar? 


b. Why is it reasonable to estimate the conditional variance as an IGARCH process? 


. Figure 3.6 shows that price has a sharp break. What happens to the estimates if your 


model of the mean includes a break dummy variable such that D, = 0 if t < July11, 2008, 
and D, = | otherwise? What happens if you use a dummy such that D, = 0 unless t = 
July 11, 2008? What if you set the dummy such that D, = 1 if t is between July 11, 2008, 
and December 31, 2008? 


. Return to the case of normally distributed errors, but allow the GARCH(1, 1) variance 


to affect mean returns. Your estimated ARCH-In-Mean model should be p, = 0.026 + 
0.225¢,_, + 0.008h,. Given that the t-statistic on the coefficient of h, is 0.65, what do you 
conclude? 


. Explain why it is reasonable to argue that an IGARCH model with normally distributed 


returns (an no ARCH-In-Mean effects) is a reasonable model for p,. Perform the stan- 
dard diagnostic checks for no remaining serial correlation and for no remaining GARCH 
effects. Test the model for leverage. Show that the Engle—Ng sign test with 1 lag indi- 
cates leverage, but most other tests indicate the absence of a leverage term. 


. Estimate an EGARCH model and show that it indicates the absence of a leverage effect. 


CHAPTER 4 


MODELS WITH TREND 


Learning Objectives 


1. 
2. 
3. 


Formalize simple models of variables with a time-dependent mean. 
Compare models with deterministic versus stochastic trends. 


Show that the so-called unit root problem arises in standard regression and 
in times-series models. 


Explain how Monte Carlo and simulation techniques can be used to derive 
critical values for hypothesis testing. 


Develop and illustrate the Dickey—Fuller and augmented Dickey—Fuller 
tests for the presence of a unit root. 


Apply the Dickey—Fuller tests to U.S. GDP and real exchange rates. 


Show how to apply the Dickey—Fuller test to series with serial correlation, 
moving average terms, multiple unit roots, and seasonal unit roots. 


Consider tests for unit roots in the presence of structural change. 
Illustrate the lack of power of the standard Dickey—Fuller test. 


Show that generalized least squares (GLS) detrending methods can enhance 
the power of the Dickey—Fuller tests. 


Explain how to use panel unit root tests in order to enhance the power of the 
Dickey—Fuller test. 


Decompose a series with a trend into its stationary and trend components. 


1. DETERMINISTIC AND STOCHASTIC TRENDS 


It is helpful to represent the general solution to a linear stochastic difference equation 
as consisting of these three distinct parts:! 


y, = trend + stationary component + noise 


Chapter 2 explained how to model the stationary component using the Box- 
Jenkins methodology. Chapter 3 showed you how to model the variance of the error 
(i.e., noise) component. A critical task for applied econometricians is to develop 
simple stochastic difference equation models that mimic the behavior of trending 
variables. The file RGDP.XLS contains the quarterly values of real U.S. GDP over 
the 194701-201204 period (in billions of year 2005 dollars). From the plot of the 
data shown in Figure 4.1, it is clear that the distinguishing feature real GDP {rgdp,} 


181 


182  CHAPTER4 MODELS WITH TREND 


17500 


15000 ~ 


12500 ~ 


10000 + 


7500 + 


Billions of 2005 dollars 


5000 ~- 


2500 -~ 


0 id Ps Oe ee ee ee ee trtrtt 


1950 1960 1970 1980 1990 2000 2010 


Actual see Fitted Forecasts 


FIGURE 4.1 A Deterministic Trend in Real GDP 


is that it increases over time. For such a series, a naive forecaster might estimate the 
sustained increase using the following cubic polynomial model for the trend: 


redp, = 1890.247 + 9.108¢ + 0.170 — 0.000173 
(27.66) (4.09) (8.70) (=2.07) 


The fitted values are shown as the dashed lines in the figure, and the forecasted val- 
ues are shown as the solid line extending past 201204. Regardless of the f-statistics, the 
use of such a model for the trend of real GDP is problematic. Since there are no stochas- 
tic components in the trend, (4.1) implies that there is a deterministic long-run growth 
rate of the real economy. The “Real Business Cycle” school argues that technological 
advancements have permanent effects on the trend of the macroeconomy. Since techno- 
logical innovations are stochastic, the trend should reflect this underlying randomness. 
Adherents to other schools of macroeconomics would also argue that the trend is not 
completely deterministic. For example, they might point out that an oil price shock or 
a targeted tax reduction could affect investment and the economy’s long-term growth 
rate. Moreover, the implications for the behavior of the business cycle are not credible. 
The deterministic trend implies that, whenever real GDP is below trend, in subsequent 
periods, there will be unusually high growth as real GDP returns to the trend. The reac- 
tion to the 2007—2008 financial crisis suggests that most economists and politicians do 
not take this notion very seriously. In fact, the forecasts beyond 2012 seem to totally 
ignore decline in GDP resulting from the financial crisis. Lastly, the negative coefficient 
on the f° term implies permanent declines in future GDP after a sufficiently long time. 


(4.1) 
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Reexamine the 3-month T-bill rate and the yield on 5-year U.S. federal govern- 
ment securities shown in Figure 3.4. The two interest rates have no obvious tendency 
to increase or decrease. Moreover, there are no decided structural breaks that induce 
one-time shifts in the mean. Nevertheless, there is no pronounced tendency for either 
series to revert to a long-run mean. The key feature of a trend is that it has a permanent 
effect on a series. If the trend is defined as the permanent or nondecaying component 
of a time series, the two interest rates have a trend. 

Suppose that a series always changes by the same fixed amount from one period 
to the next. To be more specific, suppose that 


Ay, = a 
As you know from Chapter 1, the solution to this linear difference equation is 
Yr = Yo + Aol 


where yp is the initial condition for period zero. 

Hence, the solution for Ay, = dp turns out to be nothing more than a deterministic 
linear time trend; the intercept is yọ and the slope is dg. Now, if we add the stationary 
component A(L)e, to the trend, we obtain 


Y; = Yo + aot + A(L)e, (4.2) 


In (4.2), y, can differ from its trend value by the amount A(L)e,. Since this deviation 
is stationary, the {y,} sequence will exhibit only temporary departures from the trend. 
As such, the long-term forecast of y,,, will converge to the trend line yg + ag(t + s). In 
the jargon of the profession, this type of model is called a trend stationary (TS) model. 

Now suppose that the expected change in y, is dg units. In particular, let Ay, be 
equal to dy plus a white-noise term: 


Ay, =a) + €, (4.3) 


Sometimes, Ay, exceeds ag and sometimes it falls short of ay. Since E,_)€, = 0, 
(4.3) implies that y, is expected to change by ag units from one period to the next. The 
seemingly innocuous modification of (4.2) has profound differences for the trend. If yo 
is the initial condition, it is readily verified that the general solution to the first-order 
difference equation represented by (4.3) is 

t 
Y= Yo + Ye; + aot 
i=l 

Here, y, consists of the deterministic trend component apt and the component 
Yo + Ze;. We can think of this second component as a stochastic intercept term. In the 
absence of any shocks, the intercept is yọ. However, each £; shock represents a shift in 
the intercept. Since all values of {£;} have a coefficient of unity, the effect of each shock 
on the intercept term is permanent. In the time-series literature, such a sequence is said 
to have a stochastic trend since each £; shock imparts a permanent, albeit random, 
change in the conditional mean of the series. If ay = 0, this type of model seems to 
capture some of the behavior of the interest rates. The two rates have no particular ten- 
dency to increase or decrease over time; neither do they exhibit any tendency to revert 
to a given mean value. 
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The Random Walk Model 


Equation (4.3) is the basic building block for modeling series containing stochastic 
trends. Since these models are probably unfamiliar to you, the remainder of this section 
explores the nature of stochastic trends. We begin by considering the special case of 
(4.3) when dy = 0. This model, known as the random walk model, has a special place 
in the economics and finance literature. For example, some formulations of the efficient 
market hypothesis posit that the change in the price of a stock from one day to the next 
is completely random. As such, the current price (y,) should be equal to last period’s 
price plus a white-noise term, so that 


Yt = Y1 HEr (or Ay, = €;) 


Similarly, suppose you were betting on the outcome of a coin toss and a head 
added $1 to your wealth while a tail cost you $1. We could let £, = +$1 if a head 
appears and —$1 in the event of a tail. Thus, your current wealth (y,) equals last period’s 
wealth (y,_,) plus the realized value of £,. If you play again, your wealth in t+ 1 is 
Yil = Yr F E1: 

If yọ is a given initial condition, it can be readily verified that the general solution 
to the first-order difference equation represented by the random walk model is 


t 
V= Vot Je; 
i=l 


Taking expected values, we obtain Ey, = Ey,_, = yo; thus, the mean of a random 
walk is a constant. However, all stochastic shocks have nondecaying effects on the {y,} 
sequence. Given the first ¢ realizations of the {€,} process, the conditional mean of 
Yr İS 

Emi = EO + Eni) =Y, 


Similarly, the conditional mean of y,,, (for any s > 0) can be obtained from 


S 
Yrs =Y + py Erti 
i=1 


so that 


S 
E Yis =Y: + EJ, Eni = Yi 
i=l 
For any positive value of s, the conditional means for all values of y,,, are equiv- 
alent. Hence, the constant value of y, is the unbiased estimator of all future values of 
Ys- To interpret, note that an £, shock has a permanent effect on y,. This permanence 
is directly reflected in the forecasts for y,,,. 
It is easy to show that the variance is time dependent. Given the value of yọ, the 
variance can be constructed as 


var(y,) = var(e, + E1 +-+ E1) = to’ 
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and 
var(y,_5) = var(E;_s TEs PAF £1) = (t = s)o? 


Since the variance is not constant [1.e., var(y,) # var(y,_,)], the random walk pro- 
cess is nonstationary. Moreover, as t > oo, the variance of y, also approaches infin- 
ity. Thus, the random walk meanders without exhibiting any tendency to increase or 
decrease. It is also instructive to calculate the covariance of y, and y,_,. Since the mean 
is constant, we can form the covariance y,_, as 


ELO; — Yo)O -s — Yo] = ELE, + E1 +: FED Ens + Ems- tH HED] 
= El(€,-.)° + lEs) #22 + (E17) 
=(t— s)o? 


To form the correlation coefficient p,, we can divide y,_, by the product of the stan- 
dard deviation (SD) of y, multiplied by the SD of y,_,. Thus, the correlation coefficient 


Ps is 


ps =(t—38)/V(t- s)t 
=[(t-s)/1]°° (4.4) 


This result plays an important role in the detection of nonstationary series. For 
the first few autocorrelations, the sample size ¢ will be large relative to the number 
of autocorrelations formed; for small values of s, the ratio (t — s)/t is approximately 
equal to unity. However, as s increases, the values of p, will decline. Hence, when using 
sample data, the autocorrelation function for a random walk process will show a slight 
tendency to decay. Thus, it will not be possible to use the autocorrelation function to 
distinguish between a unit root process and a stationary process with an autoregressive 
coefficient that is close to unity. 

Panel (a) in Figure 4.2 shows the time path of a simulated random walk process. 
First, 100 normally distributed random deviates were drawn from a theoretical distribu- 
tion with a mean of zero and a variance equal to unity. By setting yọ = 1, each value of 
y, (t= 1, ..., 100) was constructed by adding the random variable to the value of y,_ |. 
As expected, the series meanders without any tendency to revert to a long-run value. 
However, there does appear to be a slight positive trend in the simulated data. The rea- 
son for the upward trend is that this particular simulation happened to contain more 
positive values than negative values. The impression of a steadily increasing trend in 
the true data-generating process is false and serves as a reminder against relying solely 
on causal inspection. 


The Random Walk Plus Drift Model 


Now, let the change in y, be partially deterministic and partially stochastic. The random 
walk plus drift model augments the random walk model by adding a constant term 
do, so that 


Yi = Yi-1 tao +E; 
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Panel (a): Random walk Panel (b): Random walk plus drift 
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Panel (c): Trend-stationary Panel (d): Random walk plus noise 
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FIGURE 4.2 Four Series With Trends 


Hence, you can see that (4.3) is actually a random walk plus drift process. Given 
the initial condition yg, the general solution for y, is given by 


t 
Yi =Yo tat + D E&i (4.5) 
i=1 

Here, the behavior of y, is governed by two nonstationary components: a linear 
deterministic trend and the stochastic trend Xe;. As such, a random walk plus drift is a 
pure model of a trend; there is no separate stationary component in (4.5). 

If we take expectations, the mean of y, is yọ + aọt and the mean of y,,, is 
Ey,4; = Yo + do(t + s). To explain, the deterministic change in each realization of {y,} 
is dg; after t periods, the cumulated change is aot. In addition, there is the stochastic 
trend Le;; each £; shock has a permanent effect on the mean of y,. Notice that the first 
difference of the series is stationary; taking the first difference yields the stationary 
sequence Ay, = dg + €;. 

Panel (b) of Figure 4.2 illustrates a simulated random walk plus drift model. The 
value of dg was set equal to 0.5, and (4.5) was simulated using the same 100 deviates 
used for the random walk model above. Clearly, the deterministic time trend dominates 
the time path of the series. In a very large sample, asymptotic theory suggests this 
will always be the case. However, you should not conclude that it is always easy to 
discern the difference between a random walk model and a model with drift. In a small 
sample, increasing the variance of {e,} or decreasing the absolute value of ag could 
cloud the long-run properties of the sequence. Panel (c) uses the same random numbers 
to generate the TS series y, = 0.51+ €,. The patterns evident in the random walk plus 
drift model and the TS series look strikingly similar to each other and to the real GDP 
series shown in Figure 4.1. 
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To obtain the s-step-ahead forecast for a random walk plus drift, update (4.5) by s 
periods to obtain 
t+s 
Vrs = Yo + ag(t + 5) + > Ei 
s i=1 
= y, + dos + Erti 
i=l 
Taking the conditional expectation of y,,,, it follows that 
E Yis = Yi + aos: 


In contrast to the pure random walk model, the forecast function is not flat. The 
fact that the average change in y, is always the constant dg is reflected in the fore- 
cast function. In addition to the given value of y,, we project this deterministic change 
s times into the future. 


Generalizations of the Stochastic Trend Model 


It is not too difficult to generalize the random walk model to allow y, to be the sum of 
a stochastic trend and a white-noise component. Formally, this third model—called a 
random walk plus noise—is represented by 


t 
y%=yot J e+, (4.6) 
i=l 


where {y,} is a white-noise process with variance oz; and £, and ņ,_, are independently 
distributed for all ¢ and s [i.e., E(€,n,_,) = 0]. 
If we take the first difference of (4.6), the random walk plus noise model becomes 


Ay, = €, + An, (4.7) 


You can easily verify that (4.6) and (4.7) are equivalent by writing y,_; as 


t-1 
Meat = Yo + > Ei + m1 
i=l 
Subtract this expression from (4.6) to obtain (4.7). From (4.6), you can see that the 
key properties of the random walk plus noise model are as follows: 


1. Given the value yọ, the mean of the {y,} sequence is constant: Ey, = yo and 
updating by s periods yields Ey,,, = yọ. Notice that the successive £, shocks 
have permanent effects on the {y,} sequence in that there is no decay factor 
on the past values of €;. Hence, y, has the stochastic trend component Le;. 

2. The {y,} sequence has a pure noise component in that the {7,} sequence has 
only a temporary effect on the {y,} sequence. The current realization of n, 
affects only y, but not the subsequent values y,,,. 
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3. The variance of {y,} is not constant: var(y,) = to? + o; and var(y,_,) = 
(t—s)o? + oz. As in the other models with a stochastic trend, the variance 
of y, approaches infinity as t increases. The presence of the noise component 
means that the correlation coefficient between y, and y,_, is smaller than that 
for the pure random walk model. 


To prove that the sample correlogram will exhibit even faster decay than in the 
pure random walk model, note that the covariance between y, and y,_, is 


cov(y,, Yis) = El, — YoOr-s — Yoo 
=El(e; + E3 + E3 +- +E, tn yey +E + E3 +: + Es +N] 
Since {£,} and {y,} are independent white-noise sequences 
cov(y,,¥,-.) = (t= s)o? 
Thus, the correlation coefficient p, is 
(t —s)o? 


Ps = 
(to? + o2)[(t — s)o? + 02)] 


Comparison of p, with the correlation coefficient for the pure random walk model 
(i.e., equation 4.4) verifies that the autocorrelations for the random walk plus noise 
model are always smaller for o; > 0. Panel (d) of Figure 4.2 shows a random walk plus 
noise model. The series was simulated by drawing a second 100 normally distributed 
random terms to represent the {7,} series. For each value of t, y, was calculated using 
(4.6). If we compare Panels (a) and (d), it can be seen that the two series track each other 
quite well. The random walk plus noise model could mimic the same set of macroeco- 
nomic variables as the random walk model. The effect of the “noise” component {7, } 
is to increase the variance of {y,} without affecting its long-run behavior. After all, 
the random walk plus noise series is nothing more than the random walk model with a 
purely temporary component added. 

The random walk plus noise and the random walk plus drift models are the building 
blocks of more complex time-series models. For example, the noise and drift compo- 
nents can easily be incorporated into a single model by modifying (4.7) such that the 
trend in y, contains a deterministic and a stochastic component. Specifically, replace 
(4.7) with 

Ay, = a + €, + An, 


or 
t 


Y,=Yotagt+ Die +n, (4.8) 

i=l 
Equation (4.8) is called the trend plus noise model; y, is the sum of a deterministic 
trend, a stochastic trend, and a pure white-noise term. Moreover, the noise sequence 
does not need to be a white-noise process. Let A(L) be a polynomial in the lag operator 
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L; itis possible to augment a random walk plus drift process with the stationary process 
A(L)n, so that the general trend plus irregular model is 


t 
Yr = Yo t+ aot + 2 Ei + AO (4.9) 
i=1 
Thus, (4.9) has a deterministic trend, a stochastic trend, and a stationary compo- 
nent. 
Many more details of these unobserved components models are examined in 
Section 4.1 of the Supplementary Manual. It is useful to work through this section and 
to understand the application of signal extraction methods to this class of model. 


2. REMOVING THE TREND 


From the previous section, it should be clear that there are important differences 
between a series with a trend and a stationary series. Shocks to a stationary time 
series are necessarily temporary; over time, the effects of the shocks will dissipate, 
and the series will revert to its long-run mean level. On the other hand, a series 
containing a stochastic trend will not revert to a long-run level. Note that the trend 
can have deterministic and stochastic components. These components of the trend 
have important implications for the appropriate transformation necessary to attain 
a stationary series. The usual methods for eliminating the trend are differencing 
and detrending. For historical reasons, regressing a variable on a constant and time 
and saving the residuals is called detrending. We still use this term even though the 
method removes only a deterministic, not a stochastic, trend. A series containing a 
unit root can be made stationary by differencing. In fact, we already know that the 
dth difference of ARIMA(p, d, q) model is stationary. The aim of this section is to 
compare these two methods of isolating the trend. 


Differencing 


First consider the solution for the random walk plus drift model: 


t 
Y =o +at+ €; 
i=l 


Taking the first difference, we obtain Ay,=ad)+e,. Clearly, the {Ay,} 
sequence—equal to a constant plus a white-noise disturbance—is stationary. 
Viewing Ay, as the variable of interest, we have 


E(Ay,) = E(ag + €;) = a 
var(Ay,) = E(Ay, — do)” = Ele,’ = 0? 


and for s # 0 
cov(Ay,, Ay,_.) = ELAY, — ap)(Ay,-s — 40)] = El€€:-s) = 0 
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Since the mean and variance are constants and the covariance between Ay, and 
Ay,—; does not depend on f, the { Ay,} sequence is stationary. 

The random walk plus noise model is an interesting case study. In first differences, 
the model can be written as Ay, = £, + An,. In this form, it is easy to show that Ay, is 
stationary. Clearly, the mean is zero because 


EAy, = Ele, + An,) = 0 


Moreover, the variance and all autocovariances are constant and time invariant 
because 


var(Ay,) = E[(Ay,)”] = El(e, + An,)"] 
= Ej (e)? + 2e,An, + (An,)”] 
= 07 + 2E[e,An,] + EIND — 20-1 + (4_1)"] = 0° + 20? 
cov(Ay,, Ay,_1) = Ele, +m; = (Er + -1 — M2) = =o; 


and 


cov(Ay,, Ay,_.) = ELE, +m — N1 )(Es + Ms —M-s-1)] = 0 fors > 1. 


If we set s = 1, the correlation coefficient between Ay, and Ay,_, is 


cov(Ay,, Ay) _ =o, 


re vary, so + 62 

Examination reveals —0.5 < p, < 0 and that all other correlation coefficients are 
zero. Since the first difference of y, acts exactly as an MA(1) process, the random walk 
plus noise model is ARIMA(0, 1, 1). Since adding a constant to a series has no effect 
on the correlogram, it additionally follows that the trend plus noise model of (4.8) also 
acts as an ARIMA(0,1,1) process. 

Now consider the general class of ARIMA(p, d, q) models: 


A(L)y, = B(L)e, (4.10) 


where A(L) and B(L) are polynomials of orders p and q in the lag operator L. 

First, suppose that A(Z) has a single unit root and that B(L) has all roots outside 
the unit circle. We can factor A(L) into two components (1 — L)A*(L), where A*(L) is 
a polynomial of order p — 1. Since A(L) has only one unit root, it follows that all roots 
of A*(L) are outside the unit circle. Thus, we can write (4.10) as 


(1- DA*(Dy, = BMe, 
Now, define y* = Ay, so that 
AD = Be, (4.11) 


The {yž } sequence is stationary since all roots of A* (L) lie outside the unit circle. 
The point is that the first difference of a unit root process is stationary. If A(Z) has 
two unit roots, the same argument can be used to show that the second difference of 
{y,} is stationary. The general point is that the dth difference of a process with d unit 
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roots is stationary. Such a sequence is integrated of order d and is denoted by /(d). 
An ARIMA(p, d, q) model has d unit roots; the dth difference of such a model is a 
stationary ARMA(p, q) process. 


Detrending 


We have shown that differencing can sometimes be used to transform a nonstation- 
ary model into a stationary model with an ARMA representation. This does not mean 
that all nonstationary models can be transformed into well-behaved ARMA models by 
appropriate differencing. Consider, for example, a model that is the sum of a determin- 
istic trend and a pure noise component: 


Y= Yo t Gt + &; 
The first difference of y, is not well-behaved because 
Ay, = a, + E; — Ep1 


Here, Ay, is not invertible in the sense that Ay, cannot be expressed in the form of an 
autoregressive process. Recall that invertibility of a stationary process requires that the 
MA component does not have a unit root. 

Instead, an appropriate way to transform this model is to estimate the regression 
equation y, = dy + a,t + €,. Subtracting the estimated values of y, from the observed 
series yields estimated values of the {€,} series. More generally, a time series may have 
the polynomial trend as in 


y =a tatt+arrar+---tat" +e, 


where {e,} = a stationary process. 

Detrending is accomplished by regressing {y,} on a deterministic polynomial time 
trend, as in (4.1). The appropriate degree of the polynomial can be determined by stan- 
dard t-tests, F-tests, and/or using statistics such as the AIC or the SBC. The common 
practice is to estimate the regression equation using the largest value of n deemed rea- 
sonable. If the t-statistic indicates a, is zero, consider a polynomial trend of order n — 1. 
Continue to pare down the order of the polynomial trend until a nonzero coefficient 
is found. F-tests can be used to determine whether a group of coefficients, say, a,_; 
through a,,, is statistically different from zero. The AIC and SBC statistics can be used 
to reconfirm the appropriate degree of the polynomial. 

Simply subtracting the estimated values of the {y,} sequence from the actual values 
yields an estimate of the stationary sequence {e,}. The detrended process can then be 
modeled using traditional methods (such as ARMA estimation). 


Difference versus Trend Stationary Models 


We have encountered two ways to eliminate a trend. A trend stationary series can be 
transformed into a stationary series by removing the deterministic trend. A series with 
a unit root, sometimes called a difference stationary (DS) series, can be transformed 
into a stationary series by differencing. A serious problem is encountered when the 
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inappropriate method is used to eliminate trend. We saw an example of the problem in 
attempting to difference the equation y, = yg + aıt + €,. Consider a more general trend 
stationary process of the form 


A(L)y, = dy +a,t +e, 


where the characteristic roots of the polynomial A(L) are all outside the unit circle, and 
the expression e, is allowed to have the form e, = B(L)e,. Subtracting an estimate of 
the deterministic time trend yields a stationary and invertible ARMA model. However, 
if we use the notation of (4.11), the first difference of such a model yields 


A(L)y* = a, + (1 -L)BWe, 


First differencing the TS process has introduced a noninvertible unit root process 
into the MA component of the model. Of course, the same problem is encountered in 
a model with a polynomial time trend. 

In the same way, subtracting a deterministic time trend from a DS process is also 
inappropriate. For example, in the general trend plus irregular model of (4.9), sub- 
tracting yg + dot from each observation does not result in a stationary series since the 
stochastic portion of the trend is not eliminated. 


Are There Business Cycles? 


Traditional business cycle research decomposed real macroeconomic variables into a 
long-run (secular) trend and a cyclical component. The typical decomposition is illus- 
trated by the hypothetical data in Figure 4.3. The secular trend, portrayed by the straight 
line, was deemed to be in the domain of growth theory. The slope of the trend line was 
thought to be determined by long-run factors such as technological growth, fertility, 
immigration, and educational attainment levels. 

One source of the deviations from trend occurs because of the wavelike motion of 
real economic activity called the business cycle. Although the actual period of the cycle 
was never thought to be as regular as that depicted in the figure, the periods of prosperity 
and recovery were regarded to be as inevitable as the tides. The goal of monetary and 
fiscal policy was to reduce the amplitude of the cycle (measured by distance ab). In 
terms of our previous discussion, the trend is the nonstationary component, and the 
cyclical and irregular components are stationary. 

Although there have been recessions and periods of high prosperity, the post-World 
War II experience taught us that business cycles do not have a regular period. Even so, 
there is a widespread belief that, over the long run, macroeconomic variables grow at 
a constant trend rate and that any deviations from trend are eventually eliminated by 
the invisible hand. The belief that trend is unchanging over time leads to the common 
practice of detrending macroeconomic data using a linear (or polynomial) deterministic 
regression equation. The lower portion of the figure shows the cycle and the noise (or 
irregular) component after detrending. 

Nelson and Plosser (1982) challenged the traditional view by demonstrating that 
important macroeconomic variables tend to be DS rather than TS processes. They 
obtained time-series data for 13 important macroeconomic time series: real GNP, 
nominal GNP, industrial production, employment, unemployment rate, GNP deflator, 
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FIGURE 4.3 The Business Cycle? 


consumer prices, wages, real wages, money stock, velocity, bond yields, and an index 
of common stock prices. The sample began as early as 1860 for consumer prices 
to as late as 1909 for GNP data and ended in 1970 for all of the series. Some of 
their findings are reported in Table 4.1. The first two columns report the first- and 
second-order autocorrelations of real and nominal GNPs, industrial production, and 
the unemployment rate. Notice that the autocorrelations of the first three of the series 
are strongly indicative of a unit root process. Although p, for the unemployment rate 
is 0.75, the second-order autocorrelation is less than 0.5. 

First differences of the series yield the first- and second-order sample autocorre- 
lations r(1) and r(2), respectively. Sample autocorrelations of the first differences are 
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Table 4.1 Selected Autocorrelations From Nelson and Plosser 


pı p2 r(1) r(2) d(1) d(2) 
Real GNP 0.95 0.90 0.34 0.04 0.87 0.66 
Nominal GNP 0.95 0.89 0.44 0.08 0.93 0.79 
Industrial production 0.97 0.94 0.03 —0.11 0.84 0.67 
Unemployment rate 0.75 0.47 0.09 —0.29 0.75 0.46 


Notes: 

1Full details of the correlogram can be obtained from Nelson and Plosser (1982), who report the first six 
sample autocorrelations. 

2 Din r(i), and d(i) refer to the ith-order autocorrelation coefficient for each series, for the first difference of 
the series, and for the detrended values of the series, respectively. 


indicative of stationary processes. The evidence supports the claim that the data are 
generated from DS processes. Nelson and Plosser point out that the positive autocor- 
relation of differenced real and nominal GNP at lag | only is suggestive of an MA(1) 
process. To further strengthen the argument for DS processes, recall that differencing 
a TS process yields a noninvertible moving average process. None of the differenced 
series reported by Nelson and Plosser appears to have a unit root in the MA terms. 

The results from fitting a linear trend to the data and forming sample autocorre- 
lations of the residuals are given in the last two columns of the table. An interesting 
feature of the data is that the sample autocorrelations of the detrended data are reason- 
ably high. This is consistent with the fact that detrending a DS series will not eliminate 
the nonstationarity. Notice that detrending the unemployment rate has no effect on the 
autocorrelations. The overall implication is that macroeconomic variables do not grow 
at a smooth long-run rate. Some macroeconomic shocks are of a permanent nature; the 
effects of such shocks are never eliminated. 


The Trend in Real GDP 


Another way to make the same point is to note that the real GDP series shown in 
Figure 4.1 has a clear trend. However, the tight fit of the estimated model might fool 
a researcher into thinking the series is actually stationary around the cubic trend line 
shown in Figure 4.1. Our eyes can be deceived because such trend lines are fit so as to 
make the observed residuals as small as possible. The ACF and PACF of the residuals 
from (4.1) are shown in Panel (a) of Figure 4.4. You can see that the ACF decays slowly 
while the PACF cuts to zero after one lag. In fact, this type of slow decay in the ACF 
is typical of a series with a stochastic trend. Thus, detrending the data does not seem 
to result in a stationary series. Panel (b) shows the ACF and PACF of the logarithmic 
change in real GDP. The ACF and PACF quickly converge to zero; after two lags, all 
autocorrelations and partial autocorrelations are not statistically different from zero. 
The estimated model for logarithmic change in real GDP (Alrgdp) is 


Alrgdp, = 0.0049 + 0.3706AIrgdp,_ 
(6.80) (6.44) 


Unlike the model of the deterministic trend, the residuals from this model all 
appear to be white noise. Thus, differencing is sufficient to remove the trend. 
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FIGURE 4.4 ACF and PACF 


Rather than rely solely on an analysis of correlograms, it is possible to formally 
test whether a series is stationary. We examine such tests in the next several sections. 
The testing procedure is not as straightforward as it may seem. We cannot use the usual 
testing techniques because classical procedures all presume that the data are stationary. 
For now, it suffices to say that Nelson and Plosser are not able to reject the null hypoth- 
esis of a unit root. However, before we examine the tests for a unit root, it is important 
to note that the issue of nonstationarity also arises quite naturally in the context of the 
standard regression model. 


3. UNIT ROOTS AND REGRESSION RESIDUALS 


Consider the regression equation 


Y, = Ay + a1z, +; (4.12) 


where the symbol e, is used to indicate that the error term may be serially correlated. 

The assumptions of the classical regression model necessitate that both the {y,} 
and {z,} sequences be stationary and that the errors have a zero mean and a finite vari- 
ance. In the presence of nonstationary variables, there might be what Granger and 
Newbold (1974) call a spurious regression. A spurious regression has a high R?- 
and f-statistics that appear to be significant, but the results are without any economic 
meaning. The regression output “looks good,” but the least-squares estimates are not 
consistent and the customary tests of statistical inference do not hold. Granger and 
Newbold (1974) provide a detailed examination of the consequences of violating the 
stationarity assumption by generating two sequences, {y,} and {z,}, as independent 
random walks using the formulas: 


Yi = Ve F Eyr (4.13) 
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and 
Zp = Z1 F Ey (4.14) 


where £, and £, are white-noise processes that are independent of each other. 

Granger and Newbold generated many such samples, and for each sample esti- 
mated, a regression in the form of (4.12). Since the {y,} and {z,} sequences are inde- 
pendent of each other, (4.12) is necessarily meaningless; any relationship between the 
two variables is spurious. Surprisingly, at the 5% significance level, they were able to 
reject the null hypothesis a, = 0 in approximately 75% of the cases. Of course, at the 
5% level, a correctly sized test would yield rejections in only 5% of the regressions. 
Moreover, the regressions usually had very high R? values, and the estimated residuals 
exhibited a high degree of autocorrelation. 

To explain the findings of Granger and Newbold, note that the regression 
equation (4.12) is necessarily meaningless if the residual series {e,} is nonstationary. 
Obviously, if the {e,} sequence has a stochastic trend, any error in period ¢ never decays 
so that any deviation from the model is permanent. It is hard to imagine attaching 
any importance to an economic model having permanent errors. The simplest way to 
examine the properties of the {e,} sequence is to abstract from the intercept term ay 
and rewrite (4.12) as 

Et = Yr — US 


If y, and z, are generated by (4.13) and (4.14), we can impose the initial conditions 


Yo = Zo = 0 so that 
t t 


e= Yi 1D ei (4.15) 
i=1 i=l 

Clearly, the variance of the error becomes infinitely large as f increases. Moreover, 
the error has a permanent component in that E,e,,; = €, for alli > 0. Hence, the assump- 
tions embedded in the usual hypothesis tests are violated so that any f-test, F-test, or 
R? values are unreliable. It is easy to see why the estimated residuals from a spurious 
regression will exhibit a high degree of autocorrelation. Updating (4.15), you should 
be able to demonstrate that the theoretical value of the correlation coefficient between 
e, and e,,; goes to unity as t increases. 

Even though the true value of a, = 0, suppose that you estimate (4.12) and want 
to test the null hypothesis a, = 0. From (4.15), it should be clear that the error term is 
nonstationary. Yet, the assumption that the error term is a unit root process is incon- 
sistent with the distributional theory underlying the use of OLS. This problem will not 
disappear in large samples. In fact, Phillips (1986) proves that the larger the sample, 
the more likely you are to falsely conclude that a, £ 0. 

Worksheet 4.1 illustrates the problem of spurious regressions. The top two graphs 
show 100 realizations of the {y,} and {z,} sequences generated according to (4.13) and 
(4.14). Although {é€,,} and {€,,} are drawn from white-noise distributions, the realiza- 
tions of the two sequences are such that yjo9 is positive and z,99 is negative. 

In the lower left panel, you can see that the regression of y, on z, captures the 
within-sample tendency of the sequences to move in opposite directions. The straight 
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worKsHeEET 4, 7 
SPURIOUS REGRESSIONS: EXAMPLE 1 


Consider the two random walk processes 


Yi = Vert Eyt Z= Zegt Ez 
10 5.0 
8 4 
2.5 
6 4 
4d 0.0 
2 -2.5 
0-4 
-5.0 
ao 
4 T T T T T 75 T T T T T 
20 40 60 80 100 20 40 60 80 100 


Since both series are unit root processes with uncorrelated error terms, the regression 
of y, on z, is spurious. Given the realizations of {£ „} and {€,,}, it happens that y, tends 
to increase as z, tends to decrease. The regression line shown in the scatter plot of y, on 
z, captures this tendency. The correlation coefficient between y, and z, is —0.69 and a 
linear regression yields y, = 1.41 — 0.565z,. However, the residuals from the regression 
equation are nonstationary. 


Scatter Plot of y, Against z, Regression Residuals 


10 20 30 40 50 60 70 80 90 100 


line shown in the scatter plot is the OLS regression line y, = 1.41 — 0.565z,. The cor- 
relation coefficient between {y,} and {z,} is —0.69. The residuals from this regression 
have a unit root; as such, the coefficients 1.41 and —0.565 are spurious. Worksheet 4.2 
illustrates the same problem using two simulated random walk plus drift sequences: 
y, = 0.2+y,_) + Ey and z, = —0.1+z,, + £x- The drift terms dominate so that for 
small values of f, it appears that y, = —2z,. As sample size increases, however, the 
cumulated sum of the errors (i.e., Xe,) will pull the relationship further and further 
from —2.0. The scatter plot of the two sequences suggests that the R? statistic will 
be close to unity; in fact, R? is 0.93. However, as you can see in the last panel of 
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WORKSHEET 4,2 


SPURIOUS REGRESSIONS: EXAMPLE 2 


Consider the two random walk plus drift processes 


Y,=0.2 + V4 + Ep 2,=-0.1 + 24+ Ezt 
25 2.5 
20 4 0.0 4 
15 - -2.5 4 
-5.0 4 
10 4 
-7.5 4 
25] -10.0 4 
04 -12.5 J 
-5 T T T T T T T T T T -15.0 T T T T T T T T T T 
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 


Again, the {y,} and {z,} series are unit root processes with uncorrelated error terms 
so that the regression of y, on z, is spurious. Although it is the deterministic drift terms that 
cause the sustained increase in y, and the overall decline in z,, it appears that the two series 
are inversely related to each other. The residuals from the regression y, = 6.38 — 0.10z, 
are nonstationary. 


Scatter Plot of y, Against z, Regression Residuals 


= j : : : : : -7.5 I 1 E E 
-15.0 -12.5 -10.0 -7.5 -5.0 -2.5 0.0 2.5 10 20 30 40 50 60 70 80 90 100 


Worksheet 4.2, the residuals from the regression equation are nonstationary. All depar- 
tures from this relationship are necessarily permanent. 

The point is that the econometrician has to be very careful in working with non- 
stationary variables. In terms of (4.12), there are four cases to consider: 


CASE 1 


Both {y,} and {z,} are stationary. When both variables are stationary, the classical 
regression model is appropriate. 
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[case 2] 


The {y,} and {z,} sequences are integrated of different orders. Regression 
equations using such variables are meaningless. For example, replace (4.14) 
with the stationary process z, = pz,_, + €,, where |p| < 1. Now (4.15) is replaced 
by e, = Łe; — a, Zp'e,,_;. Although the expression Zp/e,,_; is convergent, the 
{e,} sequence still contains a stochastic trend component.” 


CASE 3 


The nonstationary {y,} and {z,} sequences are integrated of the same order, and 
the residual sequence contains a stochastic trend. This is the case in which the 
regression is spurious. The results from such spurious regressions are meaning- 
less in that all errors are permanent. In this case, it is often recommended that the 
regression equation be estimated in first differences. Consider the first difference 
of (4.12): 


Ay, = a, Az, + Ae, 


Since y,, z, and e, each contain unit roots, the first difference of each is station- 
ary. Hence, the usual asymptotic results apply. Of course, if one of the trends is 
deterministic and the other is stochastic, first differencing each is not appropriate. 


CASE 4 


The nonstationary {y,} and {z,} sequences are integrated of the same order and 
the residual sequence is stationary. In this circumstance, {y,} and {z,} are coin- 
tegrated. A trivial example of a cointegrated system occurs if €,, and €,, are 
perfectly correlated. If €,, = €,,, then (4.15) can be set equal to zero (which is 
stationary) by setting a, = 1. To consider a more interesting example, suppose 
that both z, and y, are the random walk plus noise processes: 


Yi = Hy + Eyt 
= My t Ex 


where £, and £, are white-noise processes and p, is the random walk process 
H; = Hi1 + €;. Note that both {z,} and {y,} are /(1) processes but that y, — z; = 
Et — Ex is stationary. The subtraction of z, from y, serves to nullify the stochastic 
trend. 


All of Chapter 6 is devoted to the issue of cointegrated variables. For now, it is suffi- 
cient to note that pretesting the variables in a regression for nonstationarity is extremely 
important. Estimating a regression in the form of (4.12) is meaningless if cases 2 or 3 
apply. If the variables are cointegrated, the results of Chapter 6 apply. The remainder of 
this chapter considers the formal test procedures for the presence of unit roots and/or 
deterministic time trends. 
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4. THE MONTE CARLO METHOD 


As an applied researcher, you need to know whether a data series contains a trend and 
the best way to estimate the trend. You also need to avoid several critical mistakes. 
Clearly, you do not want to difference or detrend a stationary series. Moreover, you 
do not want to detrend a unit root process or difference a trend stationary process. 
Although the properties of a sample correlogram are useful tools for detecting the pos- 
sible presence of unit roots or deterministic trends, the method is necessarily imprecise. 
What may appear as a unit root to one observer may appear as a stationary process to 
another. The problem is difficult because a near—unit root process will have the same 
shaped ACF as that of a process containing a trend. For example, the correlogram of 
a stationary AR(1) process such that p; = 0.95 will exhibit the type of gradual decay 
indicative of a nonstationary process. To illustrate some of the issues involved, suppose 
that we know a series is generated from the following first-order process: 


Yt = AY E; (4.16) 


where {€,} is white noise. 

First, suppose that we wish to test the null hypothesis that a, = 0. Under the main- 
tained null hypothesis of a, = 0, we can estimate (4.16) using OLS. The fact that £, is 
a white-noise process and that |a,| < 1 guarantees that the {y,} sequence is stationary 
and that the estimate of a, is efficient. Calculating the standard error of the estimate 
of a,, the researcher can use a t-test to determine whether a, is significantly different 
from zero. 

The situation is quite different if we want to test the hypothesis a, = 1. Now, under 
the null hypothesis, the {y,} sequence is generated by the nonstationary process: 


t 
=y + Nei (4.17) 
i=] 


Thus, if a; = 1, the variance becomes infinitely large as t increases. Under the 
null hypothesis, it is inappropriate to use classical statistical methods to estimate and 
perform significance tests on the coefficient a,. If the {y,} sequence is generated as in 
(4.17), it is simple to show that the OLS estimate of (4.16) will yield a biased estimate 
of a,. In Section 1, it was shown that the first-order autocorrelation coefficient in a 
random walk model is 


pi =((t-D/? <1 


Since the estimate of a, is directly related to the value of p,, the estimated value 
of a, is biased to be below its true value of unity. The estimated model will mimic that 
of a stationary AR(1) process with a near unit root. Hence, the usual t-test cannot be 
used to test the hypothesis a, = 1. 

Figure 4.5 shows the sample correlogram for a simulated random walk process. 
One hundred normally distributed random deviates were obtained so as to mimic the 
{e,} sequence. Assuming yọ = 0, the next 100 values in the {y,} sequence were cal- 
culated as y, = y,;_, + €,. This particular correlogram is characteristic of most sample 
correlograms constructed from nonstationary data. The estimated value of p, is close 
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FIGURE 4.5 A Simulated Random Walk Process 


to unity and the sample autocorrelations die out slowly. If we did not know the way in 
which the data were generated, inspection of Figure 4.5 might lead us to falsely con- 
clude that the data were generated from a stationary process. With this particular data, 
estimates of an AR(1) model with and without an intercept yield (standard errors are 
in parentheses): 


y, = 0.9546y,_; +; R? = 0.860 

(0.030) Cee 
y, = 0.164 + 0.9247y,_) +£; R? = 0.864 

(0.037) ee 


Examining (4.18), a careful researcher would not be willing to dismiss the possibil- 
ity of a unit root since the estimated value of a, is only 1.5133 standard deviations from 
unity: [(1 — 0.9546)/0.30 = 1.5133]. We might correctly recognize that, under the null 
hypothesis of a unit root, the estimate of a, will be biased below unity. If we knew the 
true distribution of a, under the null of a unit root, we could perform such a signifi- 
cance test. Of course, if we did not know the true data-generating process, we might 
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estimate the model with an intercept. In (4.19), the estimate of a, is more than two stan- 
dard deviations from unity: (1 — 0.9247)/0.037 = 2.035. However, it would be wrong 
to use this information to reject the null of a unit root. After all, the point of this section 
has been to indicate that such t-tests are inappropriate under the null of a unit root. 

Fortunately, Dickey and Fuller (1979, 1981) devised a procedure to formally test 
for the presence of a unit root. Their methodology is similar to that used in construct- 
ing the data reported in Figure 4.5. Suppose that we generated thousands of random 
walk sequences and that, for each, we calculated the estimated value of a,. Although 
most of the estimates would be close to unity, some would be further from unity than 
others. In performing this experiment, Dickey and Fuller found that in the presence of 
an intercept: 


m 90% of the estimated values of a, are less than 2.58 standard errors from 

unity; 

m 95% of the estimated values of a, are less than 2.89 standard errors from 

unity; 

m 99% of the estimated values of a, are less than 3.51 standard errors from unity. 

The application of these Dickey—Fuller critical values to tests for unit roots is 
straightforward. Suppose we did not know the true data-generating process and were 
trying to ascertain whether the data used in Figure 4.5 contained a unit root. Using 
these Dickey—Fuller statistics, we would not reject the null of a unit root in (4.19). 
The estimated value of a, is only 2.035 standard deviations from unity. In fact, if the 
true value of a, does equal unity, we should find the estimated value to be within 2.58 
standard deviations from unity 90% of the time. 

Be aware that stationarity necessitates —1 < a, < 1 or, equivalently, a? < 1. Thus, 
if the estimated value of a, is close to —1, you should also be concerned about non- 
stationarity. If we define y = a, — 1, the equivalent restriction is —2 < y < 0. In con- 
ducting a Dickey—Fuller test, it is possible to check that the estimated value of y is 
greater than —2.° Nevertheless, with economic data, such a case is exceedingly rare. As 
such, almost all unit root tests are one-sided tests with the alternative hypothesis y < 0. 


Monte Carlo Experiments 


The procedure that Dickey and Fuller (1979, 1981) used to obtain their critical values 
is typical of that found in the modern time-series literature. Hypothesis tests concern- 
ing the coefficients of nonstationary variables cannot be conducted using traditional 
t-tests or F-tests. The distributions of the appropriate test statistics are nonstandard and 
cannot be analytically evaluated. However, given the trivial cost of computer time, the 
nonstandard distributions can easily be derived using a Monte Carlo simulation. 

A Monte Carlo experiment attempts to replicate an actual data-generating process 
(DGP) on a computer. To be more specific, you simulate a data set with the essential 
characteristics of the actual data in question. A Monte Carlo experiment generates a 
random sample of size T and the parameters and/or sample statistics of interest are 
calculated. This process is repeated N times (where N is a large number) so that the 
distribution of the desired parameters and/or sample statistics can be tabulated. These 
empirical distributions are used as estimates of the actual distributions. 
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All major statistical software packages have a built-in random number generator. 
The first step in a Monte Carlo experiment is to computer generate a set of random num- 
bers (sometimes called pseudorandom numbers) from a given distribution. Of course, 
the numbers cannot be entirely random since all computer algorithms rely on a deter- 
ministic number-generating mechanism. However, the numbers are drawn so as to 
mimic a random process having some specified distribution. Usually, the numbers are 
designed to be normally distributed and serially uncorrelated. The idea is to use these 
numbers to represent one replication of the entire {€,} sequence. If you want to know 
more about pseudorandom number generation, see Section 4.2 of the Supplementary 
Manual. The Programming Manual illustrates the Monte Carlo method for a number 
of different time-series models. 

The second step is to construct the {y,} sequence using the random numbers and 
the parameters of the data-generating process. For example, Dickey and Fuller (1979, 
1981) obtained 100 values for {€,}, set a} = 1, yọ = 0 and calculated 100 values for 
{y,} according to (4.16). Once a series has been generated, the third step is to estimate 
the parameters of interest (such as the estimate of a, or the in-sample variance of the 
{y,} series). 

The beauty of the method is that all important attributes of the constructed {y,} 
sequence are known to the researcher. For this reason, a Monte Carlo simulation is often 
referred to as an “experiment.” The only problem is that the set of random numbers 
drawn is just one possible outcome. Obviously, the estimates in (4.18) and (4.19) are 
dependent on the values of the simulated {£,} sequence. Different outcomes for {e,} 
will yield different values of the simulated {y,} sequence. 

This is why Monte Carlo studies perform many replications of the process outlined 
above. The fourth step is to replicate steps 1 and 3 thousands of times. The goal is to 
ensure that the statistical properties of the constructed {y,} sequence are in accordance 
with the true distribution. Thus, for each replication, the parameters of interest are tab- 
ulated and critical values (or confidence intervals) obtained. As such, the properties of 
your data can be compared to the properties of the simulated data so that hypothesis 
tests can be performed. 

For our purposes, it suffices to say that the use of the Monte Carlo method is 
warranted by the Law of Large Numbers. Consider the simplest case where v, is an iden- 
tically and independently distributed (i.i.d.) random number with mean yw and variance 
o? so that 


vie (4, o°) 


The sample mean constructed by using T observations of the {v,} sequence is 


T 
v=(1/T)> y, 
t=1 


By the Law of Large Numbers, as the sample size T grows sufficiently large, 
v converges to the true mean y. Hence, the sample mean y is an unbiased estimate of the 
population mean. This is the justification for using the Dickey—Fuller critical values to 
test the hypothesis a, = 1. Moreover, if the draws are independent and the sample size 
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T grows sufficiently large, the distribution of v approaches a normal distribution with 
mean y and variance o7/T.4 

An important limitation of a Monte Carlo experiment is that the results are spe- 
cific to the assumptions used to generate the simulated data. If you change the sample 
size, include (or delete) an additional parameter in the data-generating process, or use 
alternative initial conditions, an entirely new simulation needs to be performed. More- 
over, the precision of your estimates depends on the number of replications you use. 
Oftentimes, you do not need many replications to obtain a good estimate of a popu- 
lation mean. However, it is necessary to use many thousands of replications to obtain 
good estimates of critical values. Nevertheless, you should be able to envision many 
applications of Monte Carlo experiments. As discussed in Hendry, Neale, and Erics- 
son (1990), they are particularly helpful for studying the small-sample properties of 
time-series data. As you will see shortly, Monte Carlo experiments are the workhorse 
of many tests used in modern time-series analysis. 


Example of the Monte Carlo Method 


Suppose you did not know the probability distribution for the sum of the roll of two 
dice. One way to calculate the probability distribution would be to buy a pair of dice 
and roll them several thousand times. If the dice were fair, you would find that a sum 
on your rolls would approximate this result: 


Sum 2 3 4 5 6 7 8 9 10 11 12 
Percentage 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/⁄36 


Instead of actually rolling the dice, you can easily replicate the experiment on a 
computer. You could draw a random number from a uniform [0, 1] distribution to repli- 
cate the roll of the first die. If the computer-generated number falls within the interval 
[0, 1/6], set the variable r; = 1. Similarly, if the number falls within the interval [1/6, 
2/6], set r; = 2, and so on. In this way, 7, will be some integer 1 through 6, each with a 
probability 1/6. Next, draw a second number from the same uniform [0, 1] distribution 
to represent the roll of die 2 (r,). You complete your first Monte Carlo replication by 
computing the sum r, + r,. If you compute several thousand such sums, the sample 
distribution of the sums will approximate the true distribution. 

Of course, more complicated experiments are possible. It is interesting to note that 
this method was used to reform a standard recommendation at the blackjack tables. At 
one time, the recommendation was to “stick” if the dealer shows a 2 or a 3 and you 
hold a 12. Monte Carlo experiments of a game of blackjack showed that this recom- 
mendation was incorrect. Now, a sharp blackjack player will take another card in these 
circumstances. 


Generating the Dickey-Fuller Distribution 


We need to modify the procedure above only slightly to obtain the Dickey and Fuller 
(1979) distribution. To generate the distribution for a sample size of 100, we can 
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perform the following steps: 


STEP 1: First, we need a set of random numbers to represent the {€,} sequence. If we 
use the usual set of assumptions, we can draw a set of 100 random numbers 
from a standard normal distribution. Of course, the Monte Carlo method 
would allow us to experiment with other distributions. 


STEP 2: We need to generate the sequence y, = y,_; + €,. Note that we need to ini- 
tialize the value of yọ. Once we draw the value of €;, we cannot construct y, 
without positing some value for yy. However, we do not want the results 
to be sensitive to the initial value chosen for the series. Two slightly dif- 
ferent procedures are used to purge the effects of the initial condition from 
the Monte Carlo results. First, you can initialize the value of yọ to equal the 
unconditional mean of the {y,} sequence. Alternatively, suppose you want to 
generate T values of the {y,} sequence. You can pick an initial condition for 
Yo and then generate the next T + 50 realizations. Discard the first 50 real- 
izations and use only the last T values. The idea is that the effect of the initial 
condition will dissipate after 50 periods. 


STEP 3: We need to estimate the model under the alternative hypothesis. As such, we 
estimate an equation of the form Ay, = dy + yy;_ + €;. Obtain the t-statistic 
for the null hypothesis y = 0. Note that the data are generated under the null 
hypothesis of a unit root and estimated under the alternative hypothesis. 


STEP 4: Repeat steps 1—3, 10,000 or more times. If you use a sample size such that 
T = 100, you should obtain something very similar to the Dickey—Fuller t, 
distribution plotted in Figure 4.6. Of course, you will not obtain the exact 
numbers used in the figure since you will be using a different set of random 
numbers. 


The data used to draw Figure 4.6 contains 10,000 replications. Additional replica- 
tions would reveal a somewhat smoother probability distribution. As you might expect, 
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FIGURE 4.6 The Dickey—Fuller Distribution 
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the mean of the distribution is far below zero. The mean of the f-statistics shown in the 
figure is —1.53. The distribution of f-statistics for the null hypothesis y = 0 is only 
slightly different from those reported by Dickey and Fuller; about 95% are more than 
—2.89 and 99% are more than —3.51. Hence, if you estimate a model in the form 
Ay, = ag + Yy;_, + €, and find that the t-statistic for the null hypothesis y = 0 is —3.00, 
you can reject the null hypothesis of a unit root at the 5%, but not at the 1%, level of 
significance. We will encounter a number of additional applications of Monte Carlo 
experiments throughout the text. Additional details of Monte Carlo and bootstrap- 
ping techniques are discussed in Section 4.3 of the Supplementary Manual and in the 
Programming Manual. 


5. DICKEY-FULLER TESTS 


The last section outlined a simple procedure to determine whether a, = 1 in the model 
Y, =), + €,;. Begin by subtracting y,_; from each side of the equation in order 
to write the equivalent form: Ay, = yy,_, + €, where y = a, — 1. Of course, testing 
the hypothesis a, = | is equivalent to testing the hypothesis y = 0. Dickey and Fuller 
(1979) actually consider three different regression equations that can be used to test for 
the presence of a unit root: 


Ay, = YY,1 +E; (4.20) 
Ay, = do + YY1 +E; (4.21) 
Ay, = dg + Yy- + aot + &; (4.22) 


The difference between the three regressions concerns the presence of the deter- 
ministic elements dg and a,t. The first is a pure random walk model, the second adds 
an intercept or a drift term, and the third includes both a drift and a linear time trend. 

The parameter of interest in all the regression equations is y; if y =0, the 
{y,} sequence contains a unit root. The test involves estimating one (or more) of 
the equations above using OLS in order to obtain the estimated value of y and the 
associated standard error. Comparing the resulting f-statistic with the appropriate 
value reported in the Dickey—Fuller tables allows the researcher to determine whether 
to accept or reject the null hypothesis y = 0. 

Recall that, in (4.18), the estimate of y, = a,y,_; + €, was such that a, = 0.9546 
with a standard error of 0.030. Clearly, the OLS regression in the form Ay, = yy,_; + €; 
will yield an estimate of y equal to —0.0454 with the same standard error of 
0.030. Hence, the associated t-statistic for the hypothesis y = 0 is —1.5133 (i.e., 
—0.0454/0.03 = —1.5133). 

The methodology is precisely the same regardless of which of the three forms of the 
equations is estimated. However, be aware that the critical values of the f-statistics do 
depend on whether an intercept and/or time trend is included in the regression equation. 
In their Monte Carlo study, Dickey and Fuller (1979) found that the critical values for 
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y = 0 depend on the form of the regression and sample size. The statistics called t, T 
and q, are the appropriate statistics to use for (4.20—4.22), respectively. 

Now, look at Table A in the Supplementary Manual. With 100 observations, there 
are three different critical values for the t-statistic y = 0. For a regression without the 
intercept and trend terms (dy = a, = 0), use the section labeled r. With 100 observa- 
tions, the critical values for the t-statistic are —1.61, —1.95, and —2.60 at the 10%, 
5%, and 1% significance levels, respectively. Thus, in the hypothetical example with 
y = —0.0454 and a standard error of 0.03 (so that t = —1.5133), it is not possible to 
reject the null of a unit root at conventional significance levels. Note that the appropriate 
critical values depend on sample size. As in most hypothesis tests, for any given level 
of significance, the critical values of the t-statistic decrease as sample size increases. 

Including an intercept term but not a trend term (only a, = 0) necessitates the 
use of the critical values in the section labeled Tu Estimating (4.19) in the form 
Ay, = ao + Yy;—1ı + £; necessarily yields a value of y equal to (0.9247 — 1) = —0.0753 
with a standard error of 0.037. The appropriate calculation for the t-statistic yields 
—0.0753/0.037 = —2.035. If we read from the appropriate row of Table A, with the 
same 100 observations, the critical values are —2.58, —2.89, and —3.51 at the 10%, 
5%, and 1% significance levels, respectively. Again, the null of a unit root cannot be 
rejected at conventional significance levels. Finally, with both intercept and trend, 
use the critical values in the section labeled t,; now, the critical values are —3.45 
and —4.04 at the 5% and 1% significance levels, respectively. The equation was not 
estimated using a time trend; inspection of Figure 4.5 indicates that there is little 
reason to include a deterministic trend in the estimating equation. 

As discussed in Section 7, these critical values are unchanged if (4.20—4.22) are 
replaced by the autoregressive processes: 


Ww? 


p 
Ay, = Wia + È, BAY mint + & (4.23) 
i=2 
p 
Ay, = ao + YY1 + BAY i1 +E: (4.24) 
i=2 
p 
Ay, = dg + YY;-ı + at + > BAY,-i41 + E; (4.25) 


i=2 


Tests including lagged changes are called augmented Dickey—Fuller tests and 
the same t, T, and 7, statistics are all used to test the hypotheses y = 0. Dickey and 
Fuller (1981) provide three additional F-statistics (called 6), 65, and #3) to test joint 
hypotheses on the coefficients. Using (4.21) or (4.24), the null hypothesis y = ay = 0 
is tested using the @, statistic. Including a time trend in the regression—so that (4.22) 
or (4.25) is estimated—the joint hypothesis ag = y = a, = 0 is tested using the #5 
statistic and the joint hypothesis y = a, = 0 is tested using the @; statistic. 
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Table 4.2 Summary of the Dickey—Fuller Tests 


Test Critical Values for 95% and 
Model Hypothesis Statistic 99% Confidence Intervals 
AY; = Ag F+7VYz_4 + Ant +, y=0 To —3.45 and —4.04 
y=a=0 p3 6.49 and 8.73 
a&=y=4a=0 bo 4.88 and 6.50 
AY; = 2g +7Vp_1 + Et y=0 Ty —2.89 and —3.51 
a =y=0 dy 4.71 and 6.70 
AY, = YY tE: y=0 T —1.95 and —2.60 


Note: Critical values are for a sample size of 100. 


The ¢,, $2, and h, statistics are constructed in exactly the same way as ordinary 


F-tests: 
[SSR(estricted) — SSR(unrestricted)]/r 


= SSR(unrestricted) /(T — K 


where SSR(restricted) and SSR(unrestricted) = the sums of the squared residuals 
from the restricted and unrestricted 
models, respectively, 
r = number of restrictions, 
T = number of usable observations, and 
k = number of parameters estimated in 


the unrestricted model. 


Hence, T — k = degrees of freedom in the unrestricted model. 

Comparing the calculated value of @; to the appropriate value reported in Dickey 
and Fuller (1981) allows you to determine the significance level at which the restriction 
is binding. The null hypothesis is that the data are generated by the restricted model, 
and the alternative hypothesis is that the data are generated by the unrestricted model. 
If the restriction is not binding SSR(restricted) should be close to SSR(unrestricted) 
and @, should be small; hence, large values of h; suggest a binding restriction and a 
rejection of the null hypothesis. Thus, if the calculated value of œ; is smaller than that 
reported by Dickey and Fuller, you can accept the restricted model (i.e., you do not 
reject the null hypothesis that the restriction is not binding). If the calculated value of 
Q; is larger than that reported by Dickey and Fuller, you can reject the null hypothesis 
and conclude that the restriction is binding. The critical values of the three @; statistics 
are reported in Table B in the Supplementary Manual. The complete set of test statistics 
and their critical values for a sample size of 100 is summarized in Table 4.2. 


An Example 


To illustrate the use of the various test statistics, Dickey and Fuller (1981) use quar- 
terly values of the logarithm of the Federal Reserve Board’s Production Index over the 
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195001-197704 period to estimate the following three equations: 


Ay, = 0.52 + 0.001201 — 0.119y,_; + 0.498Ay,; +e, SSR = 0.056448 


(0.15) (0.00034) (0.033) (0.081) (4.26) 
Ay, = 0.0054 + 0.447Ay,_; +£, SSR = 0.063211 

(0.0025) (0.083) (4.27) 
Ay, =0.511Ay,_, +£, SSR = 0.065966 

(0.079) (4.28) 


where SSR = sum of squared residuals and standard errors are in parentheses. 

To test the null hypothesis that the data are generated by (4.28) against the alter- 
native that (4.26) is the “true” model, use the @, statistic. Dickey and Fuller test the 
null hypothesis ag = a, = y = 0 as follows. Note that the residual sums of squares of 
the restricted and unrestricted models are 0.065966 and 0.056448, respectively, and 
that the null hypothesis entails three restrictions. With 110 usable observations and 
4 estimated parameters, the unrestricted model contains 106 degrees of freedom. Since 
0.056448/106 = 0.000533, the @, statistic is given by 


p = (0.065966 — 0.056448) /[3(0.000533)] = 5.95 


With 110 observations, the critical value of ġ, calculated by Dickey and Fuller is 
5.59 at the 2.5% significance level. Hence, it is possible to reject the null hypothesis 
of a random walk against the alternative that the data contain an intercept and/or a unit 
root and/or a deterministic time trend (1.e., rejecting dy = a = y = 0 means that one 
or more of these parameters does not equal zero). 

Dickey and Fuller also test the null hypothesis a, = y = 0 given the alternative 
of (4.26). If we now view (4.27) as the restricted model and (4.26) as the unrestricted 
model, the @3 statistic is calculated as 


3 = (0.063211 — 0.056448) /[2(0.000533)] = 6.34 


With 110 observations, Table B indicates that the critical value of œ; is 6.49 at the 
5% significance level and 5.47 at the 10% significance level. At the 10% level, they 
reject the null hypothesis and accept the alternative that the series is TS. However, at 
the 5% level, the calculated value of #3 is smaller than the critical value of 6.49; at 
this significance level, they do not reject the null hypothesis. Hence, at the 5% signif- 
icance level, they maintain the hypothesis that the series contains a unit root and/or a 
deterministic time trend. 

To compare with the qt, test (i.e., the hypothesis that only y = 0), note that 


T, = —0.119/0.033 = —3.61 


so that it is possible to reject the null of a unit root at the 5% level. 
A number of examples and tips about the test are given in Chapter 6 of the 
Programming Manual that accompanies the text. 
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6. EXAMPLES OF THE DICKEY-FULLER TEST 


Section 2 reviewed the evidence reported by Nelson and Plosser (1982) suggesting that 
macroeconomic variables are DS rather than trend stationary. We are now in a position 
to consider their formal tests of the hypothesis. For each series under study, Nelson and 
Plosser estimated the regression in the form of (4.25): 


p 
AY, = ao + YY;-1 + at + 2 BAY int +E; 
i=2 

The chosen lag lengths are reported in the column labeled p in Table 4.3. The 
estimated values ap, a), and y are reported in columns 3, 4, and 5, respectively. 

Recall that the old school view of business cycles maintains that GNP and pro- 
duction levels are trend stationary rather than DS. An adherent to this view must assert 
that y is different from zero; if y = 0, the series has a unit root and is DS. Given the 
sample sizes used by Nelson and Plosser (1982), at the 0.05 level, the critical value of 
the t-statistic for the null hypothesis y = 0 is —3.45. Thus, only if the estimated value 
of y is more than 3.45 standard deviations from zero it is possible to reject the hypoth- 
esis that y = 0. As can be seen from inspection of Table 4.3, the estimated values of 
y for real GNP, nominal GNP, and industrial production are not statistically different 
from zero. Only the unemployment rate has an estimated value of y that is significantly 
different from zero at the 0.05 level. 


Quarterly Real U.S. GDP 


Now use the data on the file RGDP.XLS to estimate the logarithmic change in real 
GDP as 


Alrgdp, = 0.1248 + 0.0001t — 0.0156/rgdp,_, + 0.3663Alrgdp,_, 


4.29 
(1.58) (1.31)  (=1.49) (6.26) Oe) 
Table 4.3 The Tests by Nelson and Plosser for Unit Roots 
P 4 a Y y+1 
Real GNP 2 0.819 0.006 —0.175 0.825 
(3.03) (3.03) (—2.99) 
Nominal GNP 2 1.06 0.006 —0.101 0.899 
(2.37) (2.34) (—2.32) 
Industrial production 6 0.103 0.007 —0.165 0.835 
(4.32) (2.44) (—2.53) 
Unemployment rate 4 0.513 —0.000 —0.294* 0.706 
(2.81) (-0.23) (—3.55) 


Notes: 

1p is the chosen lag length. Coefficients divided by their standard errors are in parentheses. Thus, 
entries in parentheses represent the t-test for the null hypothesis that a coefficient is equal to zero. 
Under the null of nonstationarity, it is necessary to use the Dickey—Fuller critical values. At the 0.05 
significance level, the critical value for the t-statistic is —3.45. 

2An (*) denotes significance at the 0.05 level. For real and nominal GNPs and industrial production, it is 
not possible to reject the null hypothesis y = 0 at the 0.05 level. Hence, the unemployment rate appears 
to be stationary. 

3The expression y + 1 is the estimate of a. 
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The t-statistic on the coefficient for /rgdp,_, is —1.49. Table A indicates that, with 
244 usable observations, the 10% and 5% critical values of t, are about —3.13 and 
—3.43, respectively. As such, we cannot reject the null hypothesis of a unit root. The 
sample value of #3 for the null hypothesis a, = y = 0 is 2.97. As Table B indicates that 
the 10% critical value is 5.39, we cannot reject the joint hypothesis of a unit root and no 
deterministic time trend. Since the sample value of œ, (equal to 17.61) far exceeds the 
5% critical value of 4.75, we do not want to exclude the drift term. We can conclude 
that the growth rate of the real GDP series acts as a random walk plus drift plus the 
irregular term 0.3663A/rgdp,_,. Additional details are contained in Section 4.4 of the 
Supplementary Manual. 


Unit Roots and Purchasing Power Parity 


Purchasing power parity (PPP) is a simple relationship linking national price levels and 
exchange rates. In its simplest form, PPP asserts that the rate of currency depreciation 
is approximately equal to the difference between domestic and foreign inflation rates. 
If p, and p* denote the logarithms of U.S. and foreign price levels and e, denotes the 
logarithm of the dollar price of foreign exchange, PPP implies 


e =p- Pi +4, 


where d, represents the deviation from PPP in period f. 

In applied work, p, and př usually refer to national price indices in f¢ relative to 
a base year, so that e, refers to an index of the domestic currency price of foreign 
exchange relative to a base year. For example, if the U.S. inflation rate is 10% while the 
foreign inflation rate is 15%, the dollar price of foreign exchange should fall by approx- 
imately 5%. The presence of the term d, allows for short-run deviations from PPP. 

Because of its simplicity and intuitive appeal, PPP has been used extensively in 
theoretical models of exchange rate determination. However, as in the well-known 
Dornbusch (1976) “overshooting” model, real economic shocks, such as productiv- 
ity or demand shocks, can cause permanent deviations from PPP. For our purposes, the 
theory of PPP serves as an excellent vehicle to illustrate many time-series testing proce- 
dures. One test of long-run PPP is to determine whether d, is stationary. After all, if the 
deviations from PPP are nonstationary (i.e., if the deviations are permanent in nature), 
we can reject the theory. Note that PPP does allow for persistent deviations; the auto- 
correlations of the {d,} sequence need not be zero. One popular testing procedure is to 
define the “real” exchange rate in period t: 


r, =¢, +p; — P; 


Long-run PPP is said to hold if the {r,} sequence is stationary. For example, in 
Enders (1988), I constructed real exchange rates for three major U.S. trading partners: 
Germany, Canada, and Japan. The data were divided into two periods: January 1960 to 
April 1971 (representing the fixed exchange rate period) and January 1973 to November 
1986 (representing the flexible exchange rate period). Each nation’s Wholesale Price 
Index (WPI) was multiplied by an index of the U.S. dollar price of the foreign currency 
and then divided by the U.S. WPI. The log of the constructed series is the {r,} sequence. 
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FIGURE 4.7 Real Exchange Rates 


A critical first step in any econometric analysis is to visually inspect the data. The 
plots of the three real exchange rate series during the flexible exchange rate period 
(through 1989) are shown in Figure 4.7. Each series seems to meander in a fashion 
characteristic of a random walk process. Notice that there is little visual evidence of 
explosive behavior or a deterministic time trend. The autocorrelation function for all 
of the series in the analysis look similar to that in Figure 4.5. In particular, the autocor- 
relation functions show little tendency to decay while the autocorrelations of the first 
differences display the classic pattern of a stationary series. 

To formally test for the presence of a unit root in the real exchange rates, aug- 
mented Dickey—Fuller tests of the form given by (4.24) were conducted. The regression 
Ar, = dg + Yr,-1 + BoAr,_; + P3Ar,2 +: +- + £, was estimated based on the following 
considerations: 


1. The theory of PPP does not allow for a deterministic time trend. Any such 
findings would refute the theory as posited. Given that the series all decline 
throughout the early 1980s and all increase during the middle to late 1980s, 
there is no reason to entertain the notion of trend stationarity. As such, the 
expression at was not included in the estimating equation. 

2. For the fixed exchange rate period, various lag length tests indicated that all 
values of p; could be set equal to zero for all three countries. However, differ- 
ent lag length tests yielded ambiguous results for the flexible exchange rate 
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period. Lag length tests indicated that J}; was statistically different from zero 
for all three countries. In contrast, F-tests and the SBC selected two lags for 
Germany and Japan and no lagged changes for Canada. As such, for the flex- 
ible rate period, the Dickey—Fuller tests were conducted using two different 
lag lengths for each country. 


For the Canadian case during the 1973—1986 period, the t-statistic for the null 
hypothesis that y = 0 is —1.42 using no lags and —1.51 using all 11 lags. Given the 
critical value of the z, statistic, it is not possible to reject the null of a unit root in the 
Canadian/U.S. real exchange rate series. Hence, PPP fails for the Canadian—U.S. case. 
In the 1960—1971 period, the calculated value of the t-statistic is —1.59; again, it is 
possible to conclude that PPP fails. 

Table 4.4 reports the results of all six estimations using the short lag lengths sug- 
gested by the F-tests and the SBC. Notice the following properties of the estimated 
models: 


1. For all six models, it is not possible to reject the null hypothesis that PPP fails. 
As can be seen from the third column of Table 4.4, the absolute value of the 
t-statistic for the null y = 0 is never more than 1.59. The economic interpre- 
tation is that real productivity and/or demand shocks have had a permanent 
influence on real exchange rates. 

2. As measured by the sample SD, real exchange rates were far more volatile in 
the 1973—1986 period than in the 1960-1971 period. Moreover, as measured 
by the standard error of the estimate (SEE), real exchange rate volatility is 


Table 4.4 Real Exchange Rate Estimation 


y! H:y=0? Lags Mean? p/DW F SD/SEE 
1973—1986 
Canada —0.022 t = —1.42 0 1.05 0.059 0.194 5.47 
(0.016) 1.88 1.16 
Japan —0.047 t = —0.64 2 1.01 —0.007 0.226 10.44 
(0.074) 2.01 2.81 
Germany —0.027 t = —0.28 2 1.11 -0.014 0.858 20.68 
(0.076) 2.04 3.71 
1960-1971 
Canada —0.031 t = —1.59 0 1.02 —0.107 0.434 0.014 
(0.019) 2.21 0.004 
Japan —0.030 t = —1.04 0 0.98 0.046 0.330 0.017 
(0.028) 1.98 0.005 
Germany —0.016 t = -1.23 0 1.01 0.038 0.097 0.026 
(0.012) 1.93 0.004 
Notes: 


'Standard errors are in parentheses. 

Entries are the t-statistic for the hypothesis y = 0. 

3Mean is the sample mean of the series. SD is the standard deviation of the real exchange rate. SEE 
is the estimated standard deviation of the residuals (i.e., the standard error of the estimate). F is the 
significance level of the test that lags 2 (or 3) through 12 can be excluded. DW is the Durbin—Watson 
statistic for first-order serial correlation, and p is the estimated autocorrelation coefficient. 
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associated with unpredictability. The SEE during the flexible exchange rate 
period is several hundred times that of the fixed rate period. It seems reason- 
able to conclude that the change in the exchange rate regime (i.e., the end of 
Bretton Woods) affected the volatility of the real exchange rate. 


3. Care must be taken to keep the appropriate null hypothesis in mind. Under 
the null of a unit root, classical test procedures are inappropriate, and we 
resort to the statistics tabulated by Dickey and Fuller. However, classical test 
procedures (which assume stationary variables) are appropriate under the 
null that the real rates are stationary. Thus, the following possibility arises: 
Suppose that the f-statistic in the Canadian case happened to be —2.16 instead 
of —1.42. If you used the Dickey—Fuller critical values, you would not reject 
the null of a unit root. Hence, you could conclude that PPP fails. However, 
under the null of stationarity (where we can use classical procedures), y is 
more than two standard deviations from zero and you would not reject the 
null of stationarity. 

This apparent dilemma commonly occurs when analyzing series with 
roots close to unity in absolute value. Unit root tests do not have much power 
in discriminating between characteristic roots close to unity and actual unit 
roots. The dilemma is only apparent since the two null hypotheses are quite 
different. It is perfectly consistent to maintain a null that PPP holds and not 
be able to reject a null that PPP fails! Notice that this dilemma does not arise 
for any of the series reported in Table 4.4; for each, it is not possible to reject 
a null of y = 0 at conventional significance levels. 

One way to circumvent this problem is to directly test the null hypothesis 
of stationarity against the alternative of nonstationarity. Kwiatowski, Phillips, 
Schmidt, and Shin (1992) show how to perform this type of test. 


4. Looking at some of the diagnostic statistics, the F-statistics all indicate 
that it is appropriate to exclude lags 2 (or 3) through 12 from the regression 
equation. To reinforce the use of short lags, notice that the first-order 
correlation coefficient of the residuals (p) is low and that the Durbin— Watson 
statistic is close to two. It is interesting that the point estimates of the charac- 
teristic roots all indicate that real exchange rates are convergent. To obtain 
the characteristic roots, rewrite the estimated equations in the autoregressive 
form r, = dy + ayr,_) Or F, = Ag + 41r;-1 + ar,2. For the four AR(1) 
models, the point estimates of the slope coefficients are all less than unity. 
In the post-Bretton Woods period (1973-1986), the point estimates of the 
characteristic roots of Japan’s second-order process are 0.931 and 0.319; for 
Germany, the roots are 0.964 and 0.256. Yet, this is precisely what we would 
expect if PPP fails; under the null of a unit root, we know that y is biased 
downward. 

To update the study, the file PANEL.XLS contains quarterly values of the real 
effective exchange rates (CPI based) for Australia, Canada, France, Germany, Japan, 
Netherlands, the United Kingdom, and the United States over the 198001-201301 
period. These are multilateral (not bilateral) real exchange rates. As an exercise, you 
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should use these data to verify that very little has changed. You should find that only, 
for France and the Netherlands, is it possible to reject a unit root in the real exchange 
rate at the 5% significance level. Try not to peek; for each country, the estimated value 
of y and the appropriate lag length are reported in Table 4.8. 


7. EXTENSIONS OF THE DICKEY-FULLER TEST 


Not all time-series variables can be well represented by the first-order autoregressive 
process Ay, = ao + Yy,_1 + dot + £,. It is possible to use the Dickey—Fuller tests in 
higher-order equations such as (4.23—4.25). Consider the pth order autoregressive pro- 
cess: 


Yi = Ao + QyY,-1 + 42y-2 + 43Y,-3 Fo + ap—2Yt-p+2 Ra ap—1Yt-p+1 + apYt-p +E, 


To best understand the methodology of the augmented Dickey—Fuller (ADF) 
test, add and subtract a,y,_,,; to obtain 


Vp = ao + A YpW-1 + AQY,-2 + 43Yp-3 $+ + Ay_2Vi-p 42 
gg (ap-1 T ap)Yi-p+1 ~ AyAY;—p+1 +e, 


Next, add and subtract (a,_; + 4,)y;_p4 to obtain 


Y, = do + AY, + AyY,_-2 + A3y,-3 + +++ (api + 4, )AY; p42 — GAY p41 + Er 


Continuing in this fashion, we obtain 


p 
Ay, = do + YY + $, BAY +E, (4.30) 
i=2 


vier =-(1- 3a) n= Ye 


i=1 j=i 

In (4.30), the coefficient of interest is y; if y = 0, the equation is entirely in first 
differences and, so, has a unit root. We can test for the presence of a unit root using the 
same Dickey—Fuller statistics discussed earlier. Again, the appropriate statistic to use 
depends on the deterministic components included in the regression equation. Without 
an intercept or a trend, use the 7 statistic; with only the intercept, use the q, statistic; and 
with both intercept and trend, use the r, statistic. It is worthwhile pointing out that the 
results here are perfectly consistent with our study of difference equations in Chapter 1. 
If the coefficients of a difference equation sum to 1, at least one characteristic root is 
unity. Here, if La; = 1, y = 0, and the system has a unit root. 

Note that the Dickey—Fuller tests assume that the errors are independent and have 
a constant variance. This raises six important problems related to the fact that we do 
not know the true data-generating process: 


1. We cannot properly estimate y and its standard error unless all of the autore- 
gressive terms are included in the estimating equation. Clearly, the simple 
regression Ay, = dg + yy;_1 + €; is inadequate to this task if (4.30) is the true 
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data-generating process. Since the true order of the autoregressive process is 
unknown, the problem is to select the appropriate lag length. 


2. The DGP may contain both autoregressive and moving average components. 
We need to know how to conduct the test if the order of the moving average 
terms (if any) is unknown. 


3. The Dickey—Fuller test considers only a single unit root. However, a pth 
order autoregression has p characteristic roots; if there are d < p unit roots, 
the series needs to be differenced d times to achieve stationarity. 


4. As we saw in Chapter 2, there may be roots that require first differences and 
others that necessitate seasonal differencing. We need to develop a method 
that can distinguish between these two types of unit root processes. 


5. There might be structural breaks in the data. As shown in Section 8, such 
breaks can impart an apparent trend to the data. 


6. It might not be known whether an intercept and/or time trend belongs in 
(4.30). Section 9 is concerned with the issue of the appropriate deterministic 
regressors. (Additional details are given in Section 4.4 entitled “Determinants 
of the Deterministic Regressors” in the Supplementary Manual.) 


Selection of the Lag Length 


It is important to use the correct number of lags in conducting a Dickey—Fuller test. Too 
few lags mean that the regression residuals do not behave like white-noise processes. 
The model will not appropriately capture the actual error process so that y and its stan- 
dard error will not be well estimated. Including too many lags reduces the power of the 
test to reject the null of a unit root since the increased number of lags necessitates the 
estimation of additional parameters and a loss of degrees of freedom. The degrees of 
freedom decrease since the number of parameters estimated has increased and the num- 
ber of usable observations has decreased. (We lose one observation for each additional 
lag included in the autoregression.) As such, the presence of unnecessary lags will 
reduce the power of the Dickey—Fuller test to detect a unit root. In fact, an augmented 
Dickey—Fuller test may indicate a unit root for some lag lengths but not for others. 

How does a careful researcher select the appropriate lag length in such circum- 
stances? One approach is the general-to-specific methodology. The idea is to start 
with a relatively long lag length and pare down the model by the usual f-test and/or 
F-tests. For example, one could estimate equation (4.30) using a lag length of p*. If 
the t-statistic on lag p* is insignificant at some specified critical value, reestimate the 
regression using a lag length of p* — 1. Repeat the process until the last lag is signifi- 
cantly different from zero. In the pure autoregressive case, such a procedure will yield 
the true lag length with an asymptotic probability of unity, provided the initial choice 
of lag length includes the true length. Using seasonal data, the process is a bit differ- 
ent. For example, using quarterly data, one could start with 3 years of lags (p = 12). If 
the t-statistic on lag 12 is insignificant at some specified critical value and if an F-test 
indicates that lags 9—12 are also insignificant, move to lags 1—8. Repeat the process 
for lag 8 and lags 5—8 until a reasonable lag length has been determined. 
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Once a tentative lag length has been determined, diagnostic checking should be 
conducted. As always, plotting the residuals is a most important diagnostic tool. There 
should not appear to be any strong evidence of structural change or serial correla- 
tion. Moreover, the correlogram of the residuals should appear to be white noise. The 
Ljung—Box Q-statistic should not reveal any significant autocorrelations among the 
residuals. It is inadvisable to use the alternative procedure of beginning with the most 
parsimonious model and continuing to add lags until the first insignificant lag is found. 
Monte Carlo studies show that this procedure is biased toward selecting a value of p 
that is less than the true value. 

As long as the regression equation does not omit a deterministic regressor present 
in the data-generating process, it is possible to perform lag length tests using f-tests 
or F-tests. The rationale follows from an important result proved by Sims, Stock, and 
Watson (1990). We will have cause to refer to several of the results of their paper. Here 
is the key finding of interest: 


Rule 1: Consider a regression equation containing a mixture of /(1) and 
1(O) variables such that the residuals are white noise. If the model is such 
that the coefficient of interest can be written as a coefficient on zero-mean 
stationary variables, then asymptotically, the OLS estimator converges to 
a normal distribution. As such, a t-test is appropriate. 


Although this rule refers to any regression equation estimated by OLS, it applies 
directly to unit root tests. As shown above, the pth-order autoregressive process: 


Vp = ao + AY 1 F A2Yt-2 + 43Yt-3 1+ + Ap_2Vp—-p42 t Ap—1Yt-p+1 F ApYt-p F Et 


can be written as 
Ay, = ao Y1 + BoAy,-1 + B3Ay,-2 ++ ++ + BAY p41 + Er (4.31) 


From Rule 1, all the coefficients on the expressions Ay,_; converge to 
t-distributions. As such, groups of these coefficients will converge to an F-distribution. 
Hence, you can perform a test of the form J; = pi}; =--- = p, = 0 using an F-test. 
Nevertheless, under the null hypothesis of a unit root, the value of y multiplies a 
nonstationary variable. As such, a test of y = 0 cannot be conducted using a standard 
t-test. 

In addition to the use of F-tests and t-tests, it is also possible to determine the lag 
length using an information criterion such as the AIC or SBC. Of course, in very large 
samples with normally distributed errors, the methods should all select the same lag 
length. In practice, the SBC will select a more parsimonious model than will either the 
AIC or t-tests. Nevertheless, whichever method is used, the researcher must ensure that 
residuals act as white-noise processes. 


An Example: In order to illustrate the various procedures to select the lag length for 
an augmented Dickey—Fuller test, 200 realizations of the following unit root process 
were generated 

Ay, = 0.5 + 0.5Ay,_; + 0.2Ay,_3 +£; 
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FIGURE 4.8 Unit Root Plus Drift 


Notice that the {y, } sequence contains a single unit root and that the appropriate lag 
length is 3. The drift parameter gives the series the decidedly increasing pattern shown 
in Figure 4.8. (You can follow along using the data on the file LAGLENGTH.XLS.) 
Pretend that you do not know the actual DGP. As such, the time path of the sequence 
allows for two possible DGPs; the series may be trend stationary or a unit root process 
containing a drift term. Hence, the null hypothesis is that of a unit root process contain- 
ing a drift against the alternative of a trend stationary process. The appropriate way to 
proceed is to estimate the series under the alternative hypothesis; hence, we estimate a 
regression equation of the form: 


p 
Ay, = ao + YY;-ı + ot + > B,Ay,_j tE, 
i=1 

If it is possible to reject the null hypothesis y = 0, the process is trend stationary. 
The problem is to determine the appropriate value for p. Toward this end, the equation 
was estimated for lag lengths of 1 through 4. As given in Table 4.5, the AIC selects a lag 
length of three and the SBC selects a lag length of one. Nevertheless, in this instance, 
the lag length seems not to make a difference; at the 5% significance level, the critical 
value for the null hypothesis y = 0 is —3.43. As such, the lag lengths selected by the 
AIC and the SBC are such that the null hypothesis of a unit root is not rejected. We can 
conclude that the sequence is not trend stationary. 

The #3 allows us to test the null hypothesis y = a, = 0; at the 5% significance 
level, the critical value is 6.49. As such, for any lag length, we would not reject the null 
hypothesis and conclude that the sequence has a stochastic trend. However, at the 5% 
significance level, the critical value for the null hypothesis ag = y = a, = 0 (i.e., the 
critical value of the @, statistic) is 4.88. For the lag lengths selected by the AIC and 
the SBC, this null hypothesis is clearly rejected. The test statistics reflect the fact that 
the data-generating process does contain the drift term do. 
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Table 4.5 Dickey—Fuller Tests and Lag Length 


AIC SBC 4 t-Statistic hz hz 
1 1076.211 1089.303 —0.017 —1.776 17.390 1.579 
2 1073.076 1089.441 —0.020 —2.049 11.188 2.101 
3 1071.817 1091.455 —0.022 —2.285 8.622 2.616 
4 1073.799 1096.710 —0.022 —2.276 8.026 2.595 


It is also possible to use t-tests and F-tests to determine the lag length. Estimating 
the equation using the lag length p = 4 yields 


Ay, = 1.24 + 0.042r — 0.022y,_, + 0.397Ay,_, + 0.108Ay,_ +0.125Ay, + 0.009Ay,_4 + €, 
(4.05) (2.28) (2.28) (5.57) (1.42) (1.64) (0.13) 

A t-test for the coefficient on Ay,_4 suggests a lag length no greater than 3. More- 
over, the F-statistic for the null hypothesis J; = p4 = 0 is 1.59 with a prob-value of 
0.206. As such, we can eliminate lags 3 and 4. Moreover, the F-statistic for the null 
hypothesis p) = p3 = P4 = 0 is 2.76 with a prob-value of 0.043. Hence, if we use a 5% 
significance level, the F-tests select a model with two lags. In this instance, the results 
regarding the significance of y are not very sensitive to the alternative lag lengths. 

The standard practice is to perform your lag lengths tests first and then check for 
a unit root. After all, the appropriate lag length can be selected regardless of whether 
or not the series in question is stationary. 


The Test with MA Components 


Since an invertible MA model can be transformed into an autoregressive model, the 
procedure can be generalized to allow for moving average components. Let the {y,} 
sequence be generated from the mixed autoregressive/moving average process: 


ALW, = CWe; 


where A(L) and C(L) are polynomials of orders p and q, respectively. 
If the roots of C(L) are outside the unit circle, we can write the {y,} sequence as 
the autoregressive process: 
A(L)y,/C(L) =e; 


or, defining D(L) = A(L)/C(L), we can write the process as 
D(L)y, = £, 


Even though D(L) will generally be an infinite-order polynomial, in principle, we 
can use the same technique as used to obtain (4.30) to form the infinite-order autore- 
gressive model: 


Ay, = VY + 2, BAM -i41 + E; 
i=2 
As it stands, this is an infinite-order autoregression that cannot be estimated using 
a finite data set. Fortunately, Said and Dickey (1984) have shown that an unknown 
ARIMA(p, 1, g) process can often be well approximated by an ARIMA(n, 1,0) 
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autoregression of order n where n < T!/3, Thus, we can usually solve the problem 
by using a finite-order approximation of the infinite-order autoregression. The test 
for y = 0 can be conducted using the aforementioned Dickey—Fuller 7, T, or T, test 
statistics. 


H? 


LAG LENGTHS AND NEGATIVE MA TERMS Unit root tests generally work 
poorly if the error process has a strongly negative MA component. While the result 
of Said and Dickey (1984) that an ARIMA (p, 1, q) process can be well approximated 
by an ARIMA (n, 1, 0) process (n < T'!/3), the interaction between the unit root and the 
negative MA component can lead to over-rejections of a unit root. To explain the nature 
of the problem, consider the ARIMA(0,1,1) process: 
Yi = V1 + Er — Byes 0<f <1. 
If we have the initial condition yọ, we can write the general solution for y, as 
t-1 
y=Yote, +C —B) YE; 
i=l 
Clearly, the {y,} sequence is not stationary since the effects of an £, shock never 
decay to zero. However, unlike a random walk process for which f}; = 0, the presence 
of the negative MA term means that £, has a one-unit effect on y, in period ¢ only. 
Since for all subsequent periods dy,,;/de, = (1 — p4) < 1, the magnitude of the effect 


is diminished when compared to that of a pure random walk. For a finite sample with 
t observations, we can construct the autocovariances as 


Yo = El, — yo] = 0? + — By PELE)? + (Ea) $0 (EL) 
=[1+(1 -p t- Do? 

Ys = ELO, — YoOr-s — Yo) 
= Ef (e, + (1 — pi )E1 +--+ PiE Es + pi)Ens-1 ++ - Bde) 
=(1 -DL + -= pD- s- 1)]o? 


The autocorrelations are formed from p, = y,/(y,7o)">. It is easy to verify that 
all of the autocorrelations p; approach unity as the sample size t becomes infinitely 
large. However, for the sample sizes usually found in applied work, the autocorrela- 
tions can be small. To see the point, let 6, be close to unity so that terms contain- 
ing (1 — f,)* can be safely ignored. In such circumstances, the ACF can be approxi- 
mated by p; = p2: + = (1 — f,)°°. For example, if p} = 0.95, all of the autocorrela- 
tions should be close to 0.22. As such, the autocorrelations will be small, appear to be 
marginally significant, and show little tendency to decay. 

From the example, it should not be surprising than that unit root tests do not 
work well in the presence of a strongly negative MA component. Since many of the 
autocorrelations are small, the ACF will resemble that of a truly stationary process. 
In fact, if p} is very close to unity, there is a common factor such that y, = y,_,; + 
E, — B\€;-| approximates the white-noise process y, = €,. Any test will have a diffi- 
cult time distinguishing between the two types of processes and will over-reject the 
null hypothesis of a unit root. Moreover, in conducting the test, it is necessary to use 
a large number of lags. We can use lag operators to write Ay, = (1 — f,L)e,, so that 
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Ay, = B, Ay, + (B))?Ay,_» + (B, 2 Ay,_3 +--+ + €,. When f, is large, many autore- 
gressive lags are needed to properly capture the dynamics of the process. The need to 
estimate a large number of coefficients can diminish the power of the test. 

Nevertheless, there are some precautions to take when testing for a unit root in the 
presence of a negative MA component. Clearly, you want to use a methodology that 
properly captures the need to use a large number of lags. Ng and Perron (2001) show 
that a modified version of the AIC (MAIC) yields a better estimate of the lag length 
than either the AIC or the BIC. Consider 


MAIC = T In(sum of squared residuals) + 2n + 2t(n) 


where t(n) = PEY ,/6, 7 is the estimated value of y and 6° is the estimated variance. 

Notice that the MAIC is equal to the usual expression for the AIC plus an additional 
penalty term 27(n). Given that all models are estimated over the same sample period, 
the value of Ly? is the same for all models. As such, t(n) will generally be small for 
models with a small value of y? relative to the variance o”. Hence, the MAIC will tend 
to select the lag length resulting in a value of y closest to that of a unit root. 

At one time, it was popular to use the Phillips—Perron (1988) test if a large negative 
moving average component is suspected. However, the test does not generally perform 
as well as the Dickey—Fuller test when using the MAIC. The Phillips—Perron (1988) 
test is discussed in Section 4.6 of the Supplementary Manual. 


Multiple Roots 


Dickey and Pantula (1987) suggest a simple extension of the basic procedure if more 
than one unit root is suspected. In essence, the methodology entails nothing more than 
performing Dickey—Fuller tests on successive differences of {y,}. When exactly one 
root is suspected, the Dickey—Fuller procedure is to estimate an equation such as Ay, = 
do + yy, + £+. In contrast, if two roots are suspected, estimate the equation: 


Ay, = ay + By, Ay, +E, (4.32) 


Use the appropriate statistic (i.e., T, t,, or t,, depending on the deterministic ele- 
ments actually included in the regression) to determine whether p} is significantly 
different from zero. If you cannot reject the null hypothesis that p} = 0, conclude that 
the {y,} sequence is /(2). If p4 does differ from zero, go on to determine whether there 
is a single unit root by estimating 


A’y, = Ay + BP Ay,_1 + PY + &; (4.33) 


Since there are not two unit roots, you should find that f, and/or p, differ from zero. 
Under the null hypothesis of a single unit root, f} < 0 and p, = 0; under the alternative 
hypothesis, {y,} is stationary so that J} and J, are both negative. Thus, estimate (4.33) 
and use the Dickey—Fuller critical values to test the null hypothesis J, = 0. If you reject 
this null hypothesis, conclude that {y,} is stationary. 
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As arule of thumb, economic series do not need to be differenced more than two 
times. However, in the odd case in which at most r unit roots are suspected, the proce- 
dure is to first estimate 

A'Y; = dy + BAT ly. 4 +E, 


If A’y, is stationary, you should find that —2 < p; < 0. If the Dickey—Fuller critical 
values for f} are such that it is not possible to reject the null of a unit root, you accept 
the hypothesis that {y,} contains r unit roots. If we reject this null of exactly r unit 
roots, the next step is to test for r — 1 roots by estimating 


Ay, = dy +P Ay + BA’ y,_1 + & 


If both p; and p, differ from zero, reject the null hypothesis of r — 1 roots. You 
can use the Dickey—Fuller statistics to test the null of exactly r — 1 unit roots if the 
t-statistics for f; and p, are both statistically different from zero. If you can reject this 
null, the next step is to form 


Ay, = ay + BA" "y,_1 + BA? y 1 + BA” *y,_1 + &; 


As long as it is possible to reject the null hypothesis that the various values of the 
p; are nonzero, continue toward the equation 


Ay, = dg + BAT! y, +A y + BAY Ho FB HE 


Continue in this fashion until it is not possible to reject the null of a unit root or until 
the {y,} series is shown to be stationary. Notice that this procedure is quite different 
from the sequential testing for successively greater numbers of unit roots. It might seem 
tempting to test for a single unit root, and if the null cannot be rejected, go on to test 
for the presence of a second root. In repeated samples, this method tends to select too 
few roots. 


Seasonal Unit Roots 


You will recall that the best-fitting model for U.S. money supply data used in Chapter 2 
had the form: 
A-AA- D0 -aDy, = (1+ Ale, 


The specification implies that the money supply has a unit root and a seasonal unit 
root. Since seasonality is a key feature of many economic series, a sizable literature 
has been developed to test for seasonal unit roots. Before proceeding, note that the first 
difference of a seasonal unit root process will not be stationary. To keep matters simple, 
suppose that the quarterly observations of {y,} are generated by 


Yi = Y4 tE; 
Here, the seasonal difference of {y,} is stationary; using the notation of Chapter 2, 
we can write Ayy, = €,. Given the initial condition yọ = y_; = +- = 0, the solution for 
yis 


Yi = Et H Eg Eg tee: 
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so that 
t/4 t/4 
J= Yt = 2 E4i — >, E4i-1 
i=0 i=0 


Hence, Ay, equals the difference between two stochastic trends. Since each shock 
has a permanent effect on the level of Ay,, the sequence is not mean reverting. However, 
the seasonal difference of a unit root process may be stationary. For example, if {y,} is 
generated by y, = y,_, + €, the fourth difference (i.e., Ayy, = E; + E1 + E2 + Em3) 
is stationary. The point is that the Dickey—Fuller procedure must be modified in order to 
test for seasonal unit roots and distinguish between seasonal versus nonseasonal roots. 

There are several alternative ways to treat seasonality in a nonstationary sequence. 
The most direct method occurs when the seasonal pattern is purely deterministic. For 
example, let D,, D}, and D3 represent quarterly seasonal dummy variables such that the 
value of D; is unity in season i and zero otherwise. Estimate the regression equation: 


p 
Ay, = ag + aD, + aD, + a3D3 +7y.1 + È, BAY int + Er (4.34) 
i=2 

The null hypothesis of a unit root (i.e. y=0) can be tested using the 
Dickey—Fuller t, statistic. (Note that you use the q, statistic since the original 
data contain an intercept). Rejecting the null hypothesis is equivalent to accepting the 
alternative that the {y,} sequence is stationary. The test is possible as Dickey, Bell, and 
Miller (1986) show that the limiting distribution for y is not affected by the removal of 
the deterministic seasonal components. If you want to include a time trend in (4.34), 
use the 7, statistic. 

Notice that the specification in (4.34) makes it difficult to test hypothesis concern- 
ing dy. Since the mean of each D; series is 1/4, the presence of the seasonal dummies 
affects the magnitude of the drift term a). To correct for this, it is common to use cen- 
tered seasonal dummy variables. Simply let D; = 0.75 in season i and —0.25 in each 
of the other three quarters of the year. Hence, the mean of D; = 0 so that the magnitude 
of dp is unchanged. 

If you suspect a seasonal unit root, it is necessary to use an alternative procedure. To 
keep the notation simple, suppose you have quarterly observations on the {y,} sequence 
and want to test for the presence of a seasonal unit root. To explain the methodology, 
note that the polynomial (1 — yL*) can be factored such that there are four distinct 
characteristic roots: 


d-yl =(-7'4D0 +y ŻDA - iy DA + iy D) (4.35) 


If y, has a seasonal unit root, y = 1. Equation (4.35) is a bit restrictive in that it only 
allows for a unit root at an annual frequency. Hylleberg et al. (1990) develop a clever 
technique that allows you to test for unit roots at various frequencies; you can test for 
a nonseasonal unit root, a unit root at a semiannual frequency, and/or a seasonal unit 
root. To understand the HEGY test (named after the four authors of the paper), suppose 
y, is generated by 

A(L)y, = €; 


224 CHAPTER4 MODELS WITH TREND 


where A(L) is a fourth-order polynomial such that 
-aL + aL) — aziL)(1 + ayiLl)y, = £; (4.36) 


Now, if a, = a, = a3 = a4 = 1, (4.36) is equivalent to setting y = 1 in (4.35). 
Hence, if a, = a, = a} = a4 = 1, there is a seasonal unit root. Consider some of the 
other possible cases: 


[case | 


If a; = 1, one homogeneous solution to (4.36) is y, = y,;. As such, the {y,} 
sequence will act as a random walk in that it tends to repeat itself each and 
every period. This is the case of a nonseasonal unit root; the appropriate period of 
differencing is Ay,. 


CASE 2 


If a, = 1, one homogeneous solution to (4.36) is y, + y,_; = 0. In this instance, 
the sequence tends to replicate itself at 6-month intervals so that there is a semi- 
annual unit root. For example, if y, = 1, follows thaty,.; = —1, Y2 = +1, Y3 = 
—1, y4 = 1, and so forth. 


CASE 3 


If either a} or a4 is equal to unity, the {y,} sequence has an annual cycle. For 
example, if a} = 1, a homogeneous solution to (4.36) is y, = iy,—ı. Thus, if 
y= 1, Yai =O Yao =O Hl, yya = =i and yya === 1 so that the 
sequence replicates itself every fourth period. The appropriate degree of 
differencing is A,y, = (1 — L*)y,. 


To develop the test, view (4.36) as function of a,, a), a3, and a, and take a Taylor 
series approximation of A(L) around the point a, = a, = a} = a, = 1. Although the 
details of the expansion are messy, first, take the partial derivative: 


dA(L)/da, = O(1 — a, D1 + aL) — aziL)(1 + ayiL)/da, 
=—-(1 +a L)(1 — aziL)(1 + a4iL)L 
Evaluating this derivative at the point a; = a, = a3 = a, = | yields 
-L(+ D -iD0 + iL) = -L1 + L(+ LÊ) = -L1 + L+ +) 
Next, form 
ðA(L)/ða, = O(1 — a, LL) + a L1 — aziL)(1 + a4iL)/ ða, 
=(1—a,L)(1 — aziL)(1 + ayiL)L 


Evaluating at the point a; = a =a; = a4 = l, yields (1-L+L1?—-L*)L. 
It should not take too long to convince yourself that evaluating dA(L)/da, and 
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dA(L)/da, at the point a; = ad) = a; = a, = 1 yields 
dA(L)/da, = —(1 — LA + iL)iL 
and 
dA(L)/da, = (1 — LAA — iL)iL 
Since A(L) evaluated at a; = a) = a3 = a4 = 1 is (1 — Lf), itis possible to approx- 
imate (4.36) by 
[A -LA 40477 +a a= 1)+(1 -L+ =P ie, = 1) 
—(1-L7)(1 + W)iL(a, — 1) + (1 - L’) — iL)iL(a, — Dy, = £; 
Define y; such that y; = (a; — 1) and note that (1 + iL)i=i-L and (1 — iL)i = 
i + L; hence, 
-Ly =n (+ bth? +Y nA -L+ — Y 
+(1 -Dir - L) - rali + Dye + & 
so that 
(1-LAy,= nd +L+ +Y nl -LH - PY 
+- LPI — yai — (73 + DL +E, (4.37) 
To purge the imaginary numbers from this expression, define y; and y¢ such that 
273 = —y6 — iy; and 2y4 = —y¢ + iys. Hence, (y3 — y4)i = y; and y3 + y4 = Yo. Substi- 
tuting into (4.37) yields 
(1-L)y,=nd+b+2 +Y nl- LHE — Ly, 
+ = LNY — Y6DY1 +E, 
Fortunately, many software packages can perform the test directly on quarterly 


and monthly data. However, to understand the mechanics necessary to implement the 
procedure, use the following steps: 


STEP 1: For quarterly data, form the following variables: 
Yi = HLL + Dy = Ya +Y +Y + Yi- 
Yon 5L +L — Dy, = Ya — Yen T Y3 — Yi- 


3-1 = (1 — Ly, = Y1 — Y3 so that 3,9 = Yy2 — Y;—4 


STEP 2: Estimate the regression: 


4 
(L = L'Y: = 11-1 — Ya¥ar—-1 + 5Y3r-1 — Y6Y31-2 + Er 


You might want to modify the form of the equation by including an inter- 
cept, deterministic seasonal dummies, and a linear time trend. As in the 
augmented form of the Dickey—Fuller test, lagged values of (1 — L*)y,_; 
may also be included. Perform the appropriate diagnostic checks to ensure 
that the residuals from the regression equation approximate a white-noise 
process. 
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STEP 3: Form the t-statistic for the null hypothesis y, = 0; a selection of the appro- 
priate critical values are reported below. If you do not reject the hypothesis 
yı = 0, conclude that a, = 1 so that there is a nonseasonal unit root. Next, 
form the t-test for the hypothesis y, = 0. If you do not reject the null hypoth- 
esis, conclude that a, = 1 and that there is a unit root with a semiannual 
frequency. Finally, perform the F-test for the hypothesis y; = y6 = 0. If the 
calculated value is less than the critical value reported in Hylleberg et al. 
(1990), conclude that y; and/or yç is zero so that there is a seasonal unit root. 
Be aware that the three null hypotheses are not alternatives; a series may 
have nonseasonal, semiannual, and a seasonal unit roots. 


At the 5% significance level, Hylleberg et al. (1990) report that the critical values 
are for 100 and 200 observations are 


T = 100 T = 200 
n=0 n=0 Y=%=9 n=0 n=0  Y5=% =9 
Intercept —2.88 -1.95 3.08 =2.87  =1.92 3.12 
Intercept + trend —3.47 = 1,95 2.96 —3.44 —1.95 3.07 
Intercept + seasonal —2.95 —2.94 6.57 —2.91 —2.89 6.62 
dummies 
Intercept + seasonal —3.53 —2.94 6.60 —3.49 —2.91 6.57 


dummies + trend 


An Example: In Chapter 2, we took the nonseasonal and the seasonal differences of 
the U.S. money supply and estimated a model of the form: 


m, = dy + aym,_, + E; + P4E,-4 


where 
m, = (1—-L)(1-L*)y, 


and y, is the logarithm of U.S. money supply as measured by M1. 

We can use the HEGY test to determine if it is appropriate to use the seasonal and 
nonseasonal differences. Since it is clear that the money supply series has a sustained 
upward movement (see Section 11 in Chapter 2), we want to allow for the possibility 
that the series is TS. Hence, we include a deterministic trend and an intercept in the 
regression. You can open the file QUARTERLY.XLS, form y, as above, and estimate 
the following regression 


(1 — L*)y, = 0.062 + 1.88*1074t — 0.003*10-+y,,_, — 0.668y,_; — 0.280y3,_, — 0.217y3,5 
(2.05) (2.17) (—2.17) (-4.17)  (=2.88)  (=2.24) 
3 8 
+9 aD; + È p0 - Ly, ; 
i=1 i=1 


where the lag length of seven was chosen by the general-to-specific method beginning 
with a lag length of 12, the D; are seasonal dummies, and y,,_ 1, Yo;_1, Y3r-1; and y3,_> 
are defined above. 
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The coefficient on y,,_, has a t-statistic of —2.17. Given the 5% critical value, 
we cannot reject the null hypothesis of a nonseasonal unit root. The f-statistic for the 
coefficient on y>,_; is —4.17; so, it is unlikely that there is a seasonal unit root at a semi 
annual frequency. The sample F-statistic for the null hypothesis that the coefficient on 
Y3;-1 and y3,_> jointly equal zero is 6.81. Hence, at the 5% significance level, there is 
not a seasonal unit root at the annual frequency (6.81 < 6.57). Thus, as in Chapter 2, it 
might not have been correct to difference and seasonally difference in the presence of 
deterministic seasonal dummy variables. As group, the seasonal dummies are highly 
significant; the sample F-statistic for the presence of the seasonal dummies is 7.49. 
Nevertheless, if you experiment with the model in the form m, = (1 — L)(1 — L^) y, used 
in Chapter 2, you should find the AR(1) and MA (4) terms perform better than a model 
with deterministic seasonal dummy variables. Moreover, if you perform the HEGY test 
without seasonal dummies, you will find both seasonal and annual unit roots. 


8. STRUCTURAL CHANGE 


In performing unit root tests, special care must be taken if it is suspected that structural 
change has occurred. When there are structural breaks, the various Dickey—Fuller test 
Statistics are biased toward the nonrejection of a unit root. To explain, consider the 
situation in which there is a one-time change in the mean of an otherwise stationary 
sequence. In the top graph of Figure 4.9, the {y,} sequence was constructed so as to 
be stationary around a mean of zero for t = 0,...,50 and then to fluctuate around a 
mean of 6 for t = 51,... , 100. The sequence was formed by drawing 100 normally and 
independently distributed values for the {€,} sequence. Setting yọ = 0, the next 100 
values in the sequence were generated using the formula: 


y, = O.5y,_) +E, + DL (4.38) 


where D, is a dummy variable such that D; = 0 for t = 1,...,50 and D; = 3 for t = 
51,..., 100. The subscript L is designed to indicate that the level of the dummy changes. 
At times, it will be convenient to refer to the value of the dummy variable in period t 
as D; (t); in the example at hand, D;(50) = 0 and D, (51) = 3. 

In practice, the structural change may not be as apparent as the break shown in 
the figure. However, the large simulated break is useful for illustrating the problem 
of using a Dickey—Fuller test in such circumstances. The straight line shown in the 
figure highlights the fact that the series appears to have a deterministic trend. In fact, 
the straight line is the best-fitting OLS equation: 


Yy, =o t+ atte, 


In the figure, you can see that the fitted value of ap is negative and the fitted value 
of a, is positive. The proper way to estimate (4.38) is to fit a simple AR(1) model and 
allow the intercept to change by including the dummy variable Dz. However, suppose 
that we unsuspectingly fit the regression equation: 


Y; = ao + A Y;,-1 +e; (4.39) 
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FIGURE 4.9 Two Models of Structural Change 


As you can infer from Figure 4.9, the estimated value of a, is necessarily biased 
toward unity. The reason for this upward bias is that the estimated value of a, captures 
the property that “low” values of y, (i.e., those fluctuating around zero) are followed by 
other “low” values, and “high” values (i.e., those fluctuating around a mean of six) are 
followed by other “high” values. For a formal demonstration, note that as a, approaches 
unity, (4.39) approaches a random walk plus drift. We know that the solution to the 
random walk plus drift model includes a deterministic trend; that is, 


Yı = Yo + aot + 


t 
Ei 


i=l 
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Thus, the misspecified equation (4.39) will tend to mimic the trend line shown in 
Figure 4.9 by biasing a, toward unity. This bias in a; means that the Dickey—Fuller 
test is biased toward accepting the null hypothesis of a unit root even though the series 
is stationary within each of the subperiods. 

Of course, a unit root process can also exhibit a structural break. The lower portion 
of Figure 4.9 simulates a random walk process with a structural change occurring at 
t = 51. This second simulation used the same 100 realizations for the {€,} sequence 
and the initial condition yọ = 2. The 100 realizations of the {y,} sequence were 
constructed as 


Y: = Y1 + E, + Dp 


where Dp(51) = 4 and all other values of Dp = 0. 

Here, the subscript P refers to the fact that there is a single pulse in the dummy 
variable. In a unit root process, a single pulse in the dummy will have a permanent effect 
on the level of the {y,} sequence. In t = 51, the pulse in the dummy is equivalent to an 
E145, Shock of four extra units. Hence, the one-time shock to Dp(51) has a permanent 
effect on the mean value of the sequence for t > 51. In the figure, you can see that the 
level of the process takes a discrete jump in t = 51, never exhibiting any tendency to 
return to the prebreak level. 

This bias in the Dickey—Fuller tests was confirmed in a Monte Carlo experiment. 
Perron (1989) generated 10,000 replications of a process like that of (4.38). Each repli- 
cation was formed by drawing 100 normally and independently distributed values for 
the {€,} sequence. For each of the 10,000 replicated series, he used OLS to estimate a 
regression in the form of (4.39). As could be anticipated from our earlier discussion, 
Perron found that the estimated values of a; were biased toward unity. Moreover, the 
bias became more pronounced as the magnitude of the break increased. 


Perron’s Test for Structural Change 


Returning to the two graphs of Figure 4.9, there may be instances in which the unaided 
eye cannot easily detect the difference between the alternative types of sequences. One 
econometric procedure to test for unit roots in the presence of a structural break involves 
splitting the sample into two parts and using Dickey—Fuller tests on each part. The 
problem with this procedure is that the degrees of freedom for each of the resulting 
regressions are diminished. Moreover, you may not know when the breakpoint actually 
occurs. It is preferable to have a single test based on the full sample. 

Perron (1989) goes on to develop a formal procedure to test for unit roots in the 
presence of a structural change at time period t = t + 1. Consider the null hypothe- 
sis of a one-time jump in the level of a unit root process against the alternative of a 
one-time change in the intercept of a trend stationary process. Formally, let the null 
and alternative hypotheses be 


Ay: y, = ao + Yy,1 + MDp +E, (4.40) 
A, H Yy; = ao + dot + WD; + €; (4.41) 
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FIGURE 4.10 Alternative Representations of Structural Change 


where Dp represents a pulse dummy variable such that Dp = 1 if t= 7+ 1 and zero 
otherwise and D, represents a level dummy variable such that D; = 1 if t > 7 and zero 
otherwise. 

Under the null hypothesis, {y,} is a unit root process with a one-time jump in the 
level of the sequence in period t = t + 1. Under the alternative hypothesis, {y,} is trend 
stationary with a one-time jump in the intercept. Figure 4.10 can help you to visualize 
the two hypotheses. Simulating (4.40) by setting aj = 0.2 and using 100 realizations 
for the {€,} sequence, the erratic dashed line in the figure depicts the time path of {y,} 
under the null hypothesis. You can see the one-time jump in the level of the process 
occurring in period 51. Thereafter, the {,} sequence continues the original random 
walk plus drift process. The alternative hypothesis posits that the {y,} sequence is sta- 
tionary around the broken trend line. Up tot = 7, {y,} is stationary around dg + at, and 
beginning at t + 1, y, is stationary around dp + azt + m2. As illustrated by the broken 
line, there is a one-time increase in the intercept of the trend if m, > 0. 

The econometric problem is to determine whether an observed series is best mod- 
eled by (4.40) or (4.41). The implementation of Perron’s (1989) technique is straight- 
forward: 


STEP 1: Unlike the Dickey—Fuller test, the null hypothesis is not directly embedded 
in the alternative hypothesis. In other words, there is no direct way to restrict 
the coefficients of the alternative so as to obtain the null hypothesis. As such, 
we need to combine the null and alternative as follows: 


Vp = Ay + Yj) + azt + Wy Dp + MD, + E, 
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STEP 2: Estimate the regression equation formed in Step 1. Under the null hypoth- 
esis of a unit root, the theoretical value of a, is unity. Perron (1989) shows 
that, when the residuals are identically and independently distributed, the 
distribution of a, depends on the proportion of observations occurring prior 
to the break. Denote this proportion by 4 = t/T where T = total number of 
observations. 

STEP 3: Perform diagnostic checks to determine if the residuals from Step 2 are seri- 
ally uncorrelated. If there is serial correlation, use the augmented form of the 
regression: 

P 
Yr = do FAY py + azt + H; Dp + HD; + py BiAY,-i + E; 
i=l 

STEP 4: Calculate the t-statistic for the null hypothesis a, = 1. This statistic can be 
compared to the critical values calculated by Perron. Perron generated 5000 
series according to H} using values of J ranging from 0 to | by increments 
of 0.1. For each value of A, he estimated the each of the regressions and cal- 
culated the sample distribution of a,. Naturally, the critical values are iden- 
tical to the Dickey—Fuller statistics when A = 0 and A = 1; in effect, there 
is no structural change unless 0 < A < 1. The maximum difference between 
the two statistics occurs when A = 0.5. For A = 0.5, the critical value of the 
t-statistic at the 5% level of significance is —3.76 (which is larger in abso- 
lute than the corresponding Dickey—Fuller statistic of —3.41). If you find a 
t-statistic greater than the critical value calculated by Perron, it is possible to 
reject the null hypothesis of a unit root. 


In addition, the methodology is quite general in that it can also allow for a one-time 
change in the drift or a one-time change in both the mean and the drift. For example, 
it is possible to test the null hypothesis of a permanent change in the drift term versus 
the alternative of a change in the slope of the trend. Here, the null hypothesis is 


Ay: y, = ao + Y1 + MoD, + €; 


where D, = 1 if t > 7 and zero otherwise. With this specification, the {y,} sequence 
is generated by Ay, = dg + £, up to period t and by Ay, = dg + fy + £, thereafter. If 
Hy > 0, the magnitude of the drift increases for t > +. Similarly, a reduction in the drift 
occurs if 4, < 0. 

The alternative hypothesis posits a trend stationary series with a change in the slope 
of the trend for t > T 

Ant y, = dy + azt + H3Dr + €, 
where Dy = t — Tt fort > t and zero otherwise. 

For example, suppose that the break occurs in period 51 so that t = 50. Thus, 
D,(1) through D7(50) are all zero, so that, for the first 50 periods, {y,} evolves as y, = 
do + at + €,. Beginning with period 51, D,(51) = 1, D;(52) = 2, ... so that, fort > T, 
{y,} evolves as y, = ao + azt + u3(t — 50) + £, = dg + (a + H3)t — SOU, + €,. Hence, 
Dr changes the slope of the deterministic trend line. The slope of the trend is a, for 
t <t and a, + p3 fort >T. 
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To be even more general, it is possible to combine the two null hypotheses H, and 
H,. A change in both the level and drift of a unit root process can be represented by 


H3: y; = dg + Yj + My Dp + MD; + E; 
where Dp and D, are the pulse and level dummies, respectively, defined earlier. 
The appropriate alternative for this case is 
A3: Y, = dy + azt + WD; + wD +E, 
Again, the procedure entails combining the null and alternative hypotheses into a 
single equation. Consider 
Yı = Ay + AY, + Ant + MyDp + MD; + p3Dr +E, 


Compare the t-statistic from the estimate of a, to the critical value calculated by 
Perron (1998). If the errors from this second regression equation do not appear to be 
white noise, estimate the equation in the form of an augmented Dickey—Fuller test. 
The t-statistic for the null hypothesis a, = 1 can be compared to the critical values cal- 
culated by Perron (1989). For various values of A, Perron reports the following critical 
values of the t-statistic at the 5% significance level: 


a H, H, H, 
0.15-0.25 -3.77 -3.80 -3.99 
0.45-0.55 -3.76 -3.96 -4.24 


0.65-0.75 —3.80 —3.85 —4.18 


Perron’s Test and Real Output 


Perron (1989) used his analysis of structural change to challenge the findings of Nelson 
and Plosser (1982). With the same variables, his results indicate that most macro- 
economic variables are not characterized by unit root processes. Instead, the variables 
appear to be TS processes coupled with structural breaks. According to Perron (1989), 
the stock market crash of 1929 and the dramatic oil price increase of 1973 were exoge- 
nous shocks having permanent effects on the mean of most macroeconomic variables. 
The crash induced a one-time fall in the mean. Otherwise, macroeconomic variables 
appear to be trend stationary. 

All variables in the Perron’s study (except real wages, stock prices, and the station- 
ary unemployment rate) appeared to have a trend with a constant slope and exhibited 
a major change in the level around 1929. In order to entertain various hypotheses con- 
cerning the effects of the stock market crash, consider the regression equation: 

k 
Y, = Ay + My Dy + MD, + at + aY; + 2 PiAYi + E; 


i=1 


where Dp(1930) = 1 and zero otherwise 


D, = 1 for all t beginning in 1930 and zero otherwise 
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Table 4.6 Retesting the Data by Nelson and Plosser for Structural Change 


T A k a My Hp ay a, 
Real GNP 62 0.33 8 3.44 —0.189 —0.018 0.027 0.282 
(5.07) (—4.28) (—0.30) (5.05) (—5.03) 
Nominal GNP 62 0.33 8 5.69 —3.60 0.100 0.036 0.471 
(5.44) (—4.77) (1.09) (5.44) (—5.42) 


Industrial production 111 0.66 8 0.120 —0.298 —0.095 0.032 0.322 
(4.37) (—4.56) (-0.095) (5.42) (—5.47) 


Notes: 

1T = number of observations; 4 = proportion of observations occurring before the structural change; k = 
lag length. 

2The appropriate t-statistics are in parenthesis. For For Myr Hae and ay, the null is that the coefficient is 
equal to zero. For a,, the null hypothesis is a, = 1. Note that all estimated values of a, are significantly 
different from unity at the 1% level. 


Under the presumption of a one-time change in the level of a unit root process, 
a, = 1, a = 0, and yp # 0. Under the alternative hypothesis of a permanent one-time 
break in the trend stationary model, a, < 1 and y, # 0. Perron’s (1989) results using 
real GNP, nominal GNP, and industrial production are reported in Table 4.6. Given the 
length of each series, the 1929 crash means that A is 1/3 for both real and nominal 
GNP and equal to 2/3 for industrial production. Lag lengths (i.e., the values of k) were 
determined using t-tests on the coefficients f;. The value k was selected if the t-statistic 
on fp, was greater than 1.60 in absolute value and the t-statistic on J; for i > k was less 
than 1.60. 

First consider the results for real GNP. When you examine the last column of the 
table, it is clear that there is little support for the unit root hypothesis; the estimated 
value of a, = 0.282 is significantly different from unity at the 1% level. Instead, real 
GNP appears to have a deterministic trend (a, is estimated to be over five SD from 
zero). Also note that the point estimate u; = —0.189 is significantly different from zero 
at conventional levels. Thus, the stock market crash is estimated to have induced a 
permanent one-time decline in the intercept of real GNP. 

These findings receive additional support since the estimated coefficients and their 
t-statistics are quite similar across the three equations. All values of a, are about five 
SD from unity, and the coefficients of the deterministic trends (a,) are all over five 
SD from zero. Since all the estimated values of 1, are significant at the 1% level and 
negative, the data seem to support the contention that real macroeconomic variables 
are TS, except for a structural break resulting from the stock market crash. 


Tests with Simulated Data 


To further illustrate the procedure, 100 random numbers were drawn to represent the 
{e,} sequence. By setting yọ = 0, the next 100 values in the {y,} sequence were drawn 
as 

y, = 0.5y 1 +e, + Dy, 


where D; = 0 fort = 1,...,50 and D; = 1 fort =51,..., 100 
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Thus, the simulation is identical to (4.38) except that the magnitude of the struc- 
tural break is diminished. This simulated series is in the data file labeled BREAK.XLS; 
you should try to reproduce the following results. If you were to plot the data, you would 
see the same pattern as in Figure 4.10. However, if you did not plot the data or were 
otherwise unaware of the break, you might easily conclude that the {y,} sequence had 
a unit root. The ACF of the {y,} sequence suggests a unit root process; for example, 
the first six autocorrelations are 


Pi P2 P3 P4 Ps P6 
Levels 0.95 0.89 0.86 0.84 0.80 0.77 
First differences —0.002 -0.211 —0.112 0.083 —0.007 —0.025 


Dickey—Fuller tests yield 


Ay, = —0.0233y,_; +£; 
Ay, = 0.0661 — 0.0566y,_, + £, 
Ay, = 0.0488 — 0.1522y,_, + 0.0041 + £, 


t-statistic for y = 0: — 0.985 
t-statistic for y = 0: — 1.706 
t-statistic for y = 0: — 2.734 


Diagnostic tests indicate that longer lags are not needed. Regardless of the presence 
of the constant or the trend, the {y,} sequence appears to be DS. Of course, the problem 
is that the structural break biases the data toward suggesting a unit root. 

Now, using the Perron procedure, the first step is to estimate the model 


y, = 0.083 + 0.479y,_, — 0.002t + 0.025Dp + 0.479D, + €, 
(1.30) (5.52) (1.25) (0.076) (5.52) 


In the next step, all of the diagnostic statistics indicate that {€,} approximates a 
white-noise process. Finally, since the standard error of a, is 0.0897, the t-statistic for 
a, = 1 is —6.01 (i.e., a, is about six SD from unity). Since the 5% critical value is 
—3.76, we can reject the null of a unit root and conclude that the simulated data are 
stationary around a break point at t = 51. 

Some care must be exercised in using Perron’s procedure since it assumes that the 
date of the structural break is known. In your own work, if the date of the break is 
uncertain, you should consult Amsler and Lee (1995), Perron (1997), Vogelsang and 
Perron (1998), Zivot and Andrews (1992), Enders and Lee (2012), or Lee and Strazicich 
(2003). The entire issue of the July 1992 Journal of Business and Economic Statistics is 
devoted to breakpoints and unit roots. An interesting application is found in Ben-David 
and Papell (1995). They consider a long span (of up to 130 years) of GDP data for 
16 countries. Allowing for breaks, they reject the null of a unit root in approximately 
half of the cases. The appropriate use of the tests of Perron (1989), Zivot and Andrews 
(1992), and Lee and Strazicich (2003) are shown in Chapter 6 of the Programming 
Manual. 
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9. POWER AND THE DETERMINISTIC 
REGRESSORS 


Tests for unit roots are not especially good at distinguishing between a series with a 
characteristic root that is close to unity and a true unit root process. Part of the problem 
concerns the power of the test and the presence of the deterministic regressors in the 
estimating equations. 


Power 


Formally, the power of a test is equal to the probability of rejecting a false null hypoth- 
esis (i.e., one minus the probability of a type II error). A test with good power would 
correctly reject the null hypothesis of a unit root when the series in question is actu- 
ally stationary. Monte Carlo simulations have shown that the power of the various 
Dickey—Fuller tests can be very low. As such, these tests will too often indicate that 
a series contains a unit root. The problem is that, in finite samples, any trend station- 
ary process can be arbitrarily well approximated by a unit root process, and a unit root 
process can be arbitrarily well approximated by a trend stationary process. To explain, 
examine the interest rate series and exchange rate series shown in the beginning of 
Chapter 3. If you did not know the actual data-generating processes, it would be difficult 
to tell which, if any, are stationary. Similarly, it is difficult for any statistical procedure 
to distinguish between unit root processes and series that are highly persistent. 

It is simple to conduct a Monte Carlo experiment that determines the power of the 
Dickey—Fuller test. To be more specific, suppose that the true data-generating process 
for a series is y, = dy + a ,y,_; + €, where |a,| < 1. If you did not know the actual 
data-generating process, you might test the series for a unit root using a Dickey—Fuller 
test. The question at hand is How often will the Dickey—Fuller test fail to detect that 
the series is actually stationary? Since the confidence intervals for the f-statistics of 
the Dickey—Fuller exceed those for the usual f-test, it is to be expected that the power 
of the Dickey—Fuller test is low. To find out the exact answer to the question, we can 
generate 10,000 stationary series and apply a Dickey—Fuller test to each. We can then 
calculate the percentage of the times that the test correctly identifies a truly stationary 
process. 

The ability of the test to properly detect that the series is stationary will depend on 
the value of a,. We would expect the test to have the least power when |a,| is close to 
unity. Thus, it makes sense to examine how the magnitude of a, affects the power of 
the test. We first construct 100 observations of the series y, = dy + a,y,;_, + £, using a 
value of a, = 0.8 and an {e,} sequence drawn from a standardized normal distribution. 
The magnitude of ay is unimportant and, so, is set equal to zero. The initial value of 
Yo is set equal to the unconditional mean of zero. Next, the simulated series is esti- 
mated in the form Ay, = dp + yy,_; + €;. The Dickey—Fuller t, statistics are used to 
determine whether the null hypothesis that y = 0 can be rejected at the 10%, 5%, and 
1% significance levels. The experiment is repeated 10,000 times, and the proportion 
of the instances in which the null hypothesis is correctly rejected is recorded. Finally, 
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the entire experiment is repeated for other values of a,. Consider the following table of 
proportions: 


a, 10% 5% 1% 


0.80 95.9 87.4 51.4 
0.90 32.1 33.1 9.0 
0.95 23.4 12.7 2.6 
0.99 10.5 5.8 1.3 


When the true value of a, = 0.8, the test does reasonably well. For example, at the 
5% significance level, the false null hypothesis of a unit root is rejected in 87.4% of 
the Monte Carlo replications. However, when a, = 0.95, the probability of correctly 
rejecting the null hypothesis of a unit root is estimated to be only 12.7% at the 5% 
significance level and 2.6% at the 1% level. Thus, the test has very low power to detect 
near unit root series. 

Does it matter that it is often impossible to distinguish between borderline station- 
ary, trend stationary, and unit root processes? The realistic answer is that it depends on 
the question at hand. In borderline cases, the short-run forecasts from the alternative 
models may have nearly identical forecasting performance. In fact, Monte Carlo stud- 
ies indicate that when the true data-generating process is stationary but has a root close 
to unity, the one-step-ahead forecasts from a differenced model are usually superior to 
the forecasts from a stationary model. However, the long-run forecasts of a model with 
a deterministic trend will be quite different from those of other models. 


Determination of the Deterministic Regressors 


Unless the researcher knows the actual data-generating process, there is a question 
concerning whether it is most appropriate to estimate (4.20), (4.21), or (4.22). It might 
seem reasonable to test the hypothesis y = 0 using the most general of the models: 


p 
AY, = ao + YY + gt + Ý, BAY, i41 + Er (4.44) 
i=2 

After all, if the true process is a random walk process, this regression should find 
that dg = y = a, = 0. One problem with this line of reasoning is that the presence of 
the additional estimated parameters reduces degrees of freedom and the power of the 
test. Reduced power means that the researcher will not be able to reject the null of a unit 
root when, in fact, no unit root is present. The second problem is that the appropriate 
statistic (i.e., T, t,,, and T,) for testing y = 0 depends on which regressors are included 
in the model. As you can see by examining the three Dickey—Fuller tables, for a given 
significance level, the confidence intervals around y = 0 dramatically expand if a drift 
and a time trend are included in the model. This is quite different from the case in 
which {y,} is stationary. When {y,} is stationary, the distribution of the t-statistic does 

not depend on the presence of other regressors. 
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The point is that it is important to use a regression equation that mimics the actual 
data-generating process. Inappropriately omitting the intercept or time trend can cause 
the power of the test to go to zero. For example, if as, in (4.44), the data-generating 
process includes a trend, omitting the term at imparts an upward bias in the estimated 
value of y. On the other hand, extra regressors increase the critical values so that you 
may fail to reject the null of a unit root. 

Campbell and Perron (1991) report the following results concerning unit root tests: 


1. Ifthe estimated regression includes deterministic regressors that are not in 
the actual data-generating process, the power of the unit root test against 
a stationary alternative decreases as additional deterministic regressors 
are added. Hence, you do not want to include regressors that are not in the 
data-generating process. 


2. Ifthe estimated regression omits an important deterministic trending variable 
present in the true data-generating process—such as the expression apf in 
(4.44)— the power of the t-statistic test goes to zero as sample size increases. 
If the estimated regression omits a nontrending variable (such as an inter- 
cept), the t-statistic is consistent, but the finite sample power is adversely 
affected and decreases as the magnitude of the coefficient on the omitted 
component increases. Hence, you do not want to omit regressors that are in 
the data-generating process. 


The direct implication of these findings is that the researcher may fail to reject the 
null hypothesis of a unit root because of a misspecification concerning the determin- 
istic part of the regression. Too few or too many regressors may cause a failure of the 
test to reject the null of a unit root. How do you know whether to include a drift or 
time trend in performing the tests? The key problem is that the tests for unit roots are 
conditional on the presence of the deterministic regressors and tests for the presence 
of the deterministic regressors are conditional on the presence of a unit root. Although 
we can never be sure that we are including the appropriate deterministic regressors in 
our econometric model, there are some useful guidelines. 


1. Always plot your data. Visual inspection can help you determine whether 
there is a clear trend in the data. 


2. Be clear about the appropriate null hypothesis and the alternative hypoth- 
esis. When you perform a Dickey—Fuller test, always estimate the model 
under the alternative hypothesis and impose the restriction implied by the 
null hypothesis. Since the null hypothesis is that the series has a unit root, 
always estimate the series as if it were stationary or TS. For example, the 
real GDP series shown in Figure 4.1 moves decidedly upward over time. The 
issue is whether the series is trend stationary or contains a unit root plus a 
drift term. As such, the appropriate model to estimate has the form Ay, = 
dy + YY,-1 + azt + UP,Ay,_; + €,. You then test the restrictions y = 0 and/or 
y = a = 0. There is no need to estimate a model without af since the alter- 
native hypothesis is not represented in such a specification. 


238 CHAPTER4 MODELS WITH TREND 


3. You do not want to reject the null hypothesis when the series actually has a 
unit root (a Type I error) or incorrectly accept the null of a unit root when the 
series is stationary or TS (a Type II error). Nevertheless, any test involves 
the possibility of making such errors. As such, you do not want to perform 
needless tests. In the example of real GDP, there is little point in testing the 
restriction dy = y = a = 0 since real GDP clearly increases over time. 


4. Testing a restriction on a model that has already been restricted creates the 
possibility of compounding your errors. Suppose that a test for the presence 
of the time trend allows you to set a, = 0. A subsequent test of the restriction 
dy = y = Oin the model Ay, = dy + yy,_, + Zf;Ay,_; + €; is conditional on 
whether the first test was correct in allowing you to exclude the deterministic 
trend. 


At one time, researchers would apply a battery of tests on the values of ag and/or ay 
when the form of the deterministic regressors was completely unknown. One standard 
procedure is discussed in Section 4.4 of the Supplementary Manual and in Chapter 6 
of the Programming Manual. Now, when power seems to be an issue, it is typical to 
use variants of the Dickey—Fuller test that have enhanced power. 


10. TESTS WITH MORE POWER 


If you examine the basic regression used in the Dickey—Fuller test, Ay, = dy + yy,_; + 
at + €,, you will see that there are two different types of regressors. The intercept and 
the time trend are purely deterministic while y,_; is a unit root process under the null 
hypothesis. Notice that the coefficients of the deterministic expressions, ag and a), play 
very different roles under the null and alternative hypotheses. If we change equation 
numbers and symbols to match those used in the text, Phillips and Schmidt (1992, 
p. 258) make the following observation about the parameters in the Dickey—Fuller 
regressions 


“... the parameter a, represents trend when y = 0 (since the solution for 
y, then includes the deterministic trend term dof), but it determines level 
when y < 0 (since y, is then stationary around the level —ag/y). Simi- 
larly, [in equation (4.44)], when y = 0, the parameter ag represents trend 
and a, represents quadratic trend, while under the alterative ag determines 
level and a, determines trend. This confusion over the meanings of the 
parameters shows up in the properties of the Dickey—Fuller tests.” 


The essential problem is that the intercept and the slope of the trend are often 
poorly estimated in the presence of a unit root. In a sense, the least squares principle is 
unable to properly separate the movements of y, into those induced by the deterministic 
trend and those induced by the stochastic trend. Even in the circumstance in which {y,} 
is stationary, the intercept and trend can be poorly estimated if the {y,} series is quite 
persistent. Of course, if the estimates of ag and a, have substantial error, the estimate 
of y will have a large standard error too. You can see this effect by comparing the 


Dickey—Fuller critical values for 7, Tia and 7, to those in a standard t-table. The overly 
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wide confidence interval for y means that you are less likely to reject the null hypothesis 
of a unit root even when the true value of y is not zero. 

A number of authors have devised clever methods to improve the estimates of 
the intercept and trend coefficients. For example, Schmidt and Phillips (1992) pro- 
posed a two-step testing procedure that has better power than the Dickey—Fuller test. 
Although they call their test a Lagrange Multiplier (LM) test, the method is actually 
quite simple. Instead of the Dickey—Fuller specification, under the null hypothesis, the 
{y,} Sequence is a random walk plus a drift so that: 


t-l 
Yı = ao + ant + > Ey j 
i=0 
or 
Ay, = a) +E, 


The idea is to estimate the trend coefficient, a,, using the regression Ay, = a, + €,- 
As such, the presence of the stochastic trend Ze; does not interfere with the estimation 
of a. The resulting estimate of a, (called â,) is an estimate of the slope of the time 
trend. Use this estimate to form the detrended series as y? = y, — (yı — 4) — Gt, where 
yı is the initial value of the {y,} series. Note that (y, — a) acts as the intercept of the 
estimated trend line and â, acts as the slope. The use of (y, — â,) in the detrending 
procedure ensures that the initial value of the detrended series (i.e., y3) is zero. In the 
second step of the procedure, you estimate a variant of the Dickey—Fuller test using 
the detrended series in place of the level of y,_, 


Ay, = dg + ry, +€, 


or, if there is any serial correlation in the residuals, estimate 


P 
Ay, = do + vy, + £ cAyt, +E, 
i=l 
The null of a unit root can be rejected if it is found that y # 0. The point is that 
Schmidt and Phillips (1992) show that it is preferable to estimate the parameters of 
the trend using a model without the persistent variable y,,. Once the trend is effi- 
ciently estimated, it is possible to detrend the data and perform the unit root test on the 
detrended data. Some of the critical values for the test are 


Critical Values of the Schmidt—Phillips Unit Root Test 


T 1% 2.5% 5% 10% 

50 =3.73 —3.39 -3.11 —2.80 
100 —3.63 =3.32 —3.06 —2.77 
200 —3.61 —3.30 —3.04 —2.76 


500 —3.59 —3.29 —3.04 —2.76 
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Elliott, Rothenberg, and Stock (1996) show that it is possible to further enhance 
the power of the test by estimating the model using something close to first differ- 
ences. The idea is that, under the alternative hypothesis that the series is stationary, the 
Schmidt—Phillips model in first differences is misspecified. Hence, consider the TS 
model: 

y, = ag + at + BLE, 


Instead of creating the first difference of y,, Elliott, Rothenberg, and Stock (ERS) 
preselect a constant close to unity, say a, and subtract ay,_, from y, to obtain 


Y, = ag + ant — Ady — aa (t — 1) + €, fort =2,...,T7 


where Y, = y, — ay,_, and e, is a stationary error term. For t = 1, such near differencing 
is not possible and the initial value V, is set equal to y,. For simplicity, collect terms 
with dg and a, to obtain 


Y, = (1-a) +a [1 -—a)t+a)] +e, 


Now, it should be clear how to obtain estimates of ay and a, using OLS. Create the 
variable z1, equal to the constant (1 — a) and the variable z2, equal to the deterministic 
trend a + (1 — a@)t. To obtain the desired estimates of ag and a>, simply regress z1, and 
z2, on y,. In other words, use OLS to estimate: 


Y, = agl, + a372, + €, 


Note that the test is conditional on the initial value of the {y,} series in that y; = 
dy + dy + €,. As such, the initial values of z1, and z2, should be set equal to unity and 
the initial value of y, should be set equal to y; (i.e., set z1, = 1, z2, = 1, andy, = yı). 
Since the goal is to obtain the estimated values of ag and a), at this step, it is not 
especially important if the residual, e,, is serially correlated. The important point is 
that the estimates a, and a, can be used to detrend the {y,} series as 


d _ z z 
Yt = Y, T Ay — ht 


In the second step of the procedure, estimate the basic Dickey —Fuller regression 
using the detrended data. Hence, estimate the regression equation: 


d 
Ay? = yyl +E, 


If there is serial correlation in the residuals, the augmented form of the test can be 


estimated as 
P 


Ayi = yyt; + Di Ay; +e, 
i=l 
Elliott, Rothenberg, and Stock (1996) recommend selecting the lag length p using 
the SBC. As in the Schmidt—Phillips test, the null of a unit root can be rejected if it is 
found the y ¥ 0. The critical values of the test depend on whether a trend is included in 
the test. Zf there is an intercept but not a trend, the critical values are precisely those of 
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the Dickey—Fuller t test reported in the top portion of Table A. In essence, you use the 
Dickey—Fuller critical values as if there is no intercept in the data-generating process. 
If there is a trend, the critical values depend on the value of æ selected to perform the 
“near differenced” variable },. ERS report that the value of «æ that seems to provide the 
best overall power is a = (1 — 7/T) for the case of an intercept and a = (1 — 13.5/T) if 
there is an intercept and trend. The table below reports the critical values for the case of 
a trend and a = 1 — 13.5/T. Notice that, as the sample size T increases, a approaches 
unity so that ĵ, is approximately equal to Ay,. In the literature, the ERS test is often 
referred to as the Dickey—Fuller generalized least squares (DF-GLS) test.° 


Critical Values of the ERS Test with Trend and a = 1 — 13.5 / T 


T 1% 2.5% 5% 10% 
50 =3 7 —3.46 =3:19 —2.89 
100 —3.58 —3.29 —3.03 —2.74 
200 —3.46 —3.18 —2.93 —2.64 
œ —3.48 =3.15 —2.89 —2.57 


One aspect of the ERS test that some researchers might find objectionable is the 
assumption that the initial value y, is set equal to y,. This is equivalent to assum- 
ing that the initial value of the error term is equal to zero. An alternative assumption 
is that the initial value of the shock is drawn from its unconditional distribution. Note 
that relaxing the assumption concerning the initial condition acts to reduce the power 
of this version of the test. In this circumstance, the first value of yı is set equal to 
yı — a7), z1; = (1 — a’), and z2, = (1 — a’)°>.® Hence, instead of condition- 
ing on the magnitude of y,, you condition on the number of SD from zero. Note that 
Elliott (1999) recommends using a = (1 — 10/7) regardless of whether or not a trend 
is included in the regression. The critical values for this test are different from those 
reported above. The asymptotic critical values for regressions with an intercept and an 
intercept plus trend are as follows: 


1% 2.5% 5% 10% 


Intercept —3.28  —2.98 —2.73 —2.46 
Trend -3.71 -341 -3.17 =291 


An Example 


In order to illustrate the appropriate use of the procedure, the file labeled 
ERSTEST.XLS contains 200 observations generated from the equation: y, = 
1+ 0.95y,_,; + 0.01t+ £,. Although the series is clearly trend stationary, the point of 
this exercise is to illustrate the appropriate use of the ERS test and compare the results 


242 CHAPTER4 MODELS WITH TREND 


to those of a Dickey—Fuller test. If you examine the file, you will see that the first five 
rows are 


t y y_tilde zl z2 yd 

1 20.03339 20.03339 1.0000 1.0000 0.036376 
2 21.85126 3.170125 0.0675 1.0675 1.692188 
3 22.01347 1.637169 0.0675 1.1350 1.692338 
4 22.08649 1.558934 0.0675 1.2025 1.603304 
5 22.17255 1.576890 0.0675 1.2700 1.527297 


The series in column 2, called y, contains the 200 realizations represent- 
ing the y, series. Since the data contain a trend, the appropriate value of a 
to use is 1 — 13.5/200 = 0.9325. This value of œ was used to construct the 
next series (y_tilde) as y,—0.9325y,_,. For example, yı = y1, Y2 = Y2 — ay, = 
21.85126 — 0.9325(20.03339) = 3.170125 and Y3 = y3 — ay) = 1.637169. Since 
a = 0.9325, zl, = zl, =---=1-—a=0.0675. Similarly, z2, = 0.9325 + 0.06751 so 
that z2} = 1.0000, z2, = 1.0675, z2; = 1.1350, .... The regression of y, on z1, and 
z2, yields 

Y, = 19.835 * z1, + 0.162 * z2, 


These estimates of ag and a, are used to construct the detrended series as 
y? = y, — 19.835 — 0.162¢ 


This series is reported in the last column of ERSTEST.XLS. Before proceeding, it 
is interesting to consider the particular solution for the skeleton of y, = 1 + 0.95y,_; + 
0.01t + €,. From your knowledge of Chapter 1 (also see question 7 of Chapter 2), you 
should have no trouble verifying that the desired solution is 16.2 + 0.2t. The estimated 
trend equation, 19.835 + 0.1621, is reasonably close to the particular solution. 

Now that y, has been detrended, it is straightforward to perform the unit root test. 
If you use the data in the spreadsheet, you should find 


Ay? = -0.0975y4_, 
(-3.154) 


The 2.5% and 5% critical values for the test are —3.15 and —2.89, respectively. As 
such, the null hypothesis of a unit root is clearly rejected at the 5% level and just barely 
rejected at the 2.5% level. You will find that augmenting this regression with lagged 
values of Aye only acts to increase the value of the SBC. You can perform Elliott’s 
(1999) version of the test in the same way, except that you set a = 1 — 10/200 = 0.95, 
¥,(1 a7)? = 6.255, zl; = (1 = a7)? = 0.3122, and z% = (1 — a7)? = 0.3122. 
Hence, assuming that the initial value of the series is drawn from its unconditional 
mean, you should obtain the t-statistic —3.147. The null hypothesis of a unit root is not 
rejected (although it is very close to being rejected) using the 5% critical value of —3.17. 

The results of Elliott’s (1999) test are very similar to the result found from the 
Schmidt—Phillips test. To perform the Schmidt—Phillips LM test, you should first 
regress Ay, on a constant and obtain: Ay, = 0.1713. Since yı = 20.03339, you detrend 
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the y, series using y? = 20.03339 — (20.03339 — 0.1713) — 0.1713¢t. Now, you should 
be able to reproduce the regression equation Ay, = 0.0691 — 0.0903y7. Since the 
t-statistic for the coefficient on yt is —3.052, the null hypothesis of a unit root is 
just rejected at the 5% significance level. Very different results are obtained when 
performing a standard Dickey—Fuller test. Consider the estimated model: 


Ay, = 2.0809 + 0.01587 — 0.0979y,_, + £, 
(3.265) (3.106) (—3.124) 


The estimated value of y is —0.0979, and the t-statistic for the null hypothesis 
y = 0 is —3.124. From Table A, the critical values of the t, statistic at the 5% and 
10% significance levels are about —3.45 and —3.15, respectively. Hence, if we use the 
Dickey—Fuller test, the null hypothesis of a unit root cannot be rejected at conventional 
significance levels. 

Section 9 reported the results of a simple Monte Carlo study of the power of the 
standard Dickey—Fuller test for the process: y, = dy) + a) y,_| + €;. Now, if we use the 
ERS test, the proportions (out of 10,000 replications) in which the null hypothesis of a 
unit root were correctly rejected are 


a, 10% 5% 1% 
0.80 99.8 99.1 86.6 
0.90 93.9 79.0 33.4 
0.95 64.3 39.8 10.0 
0.99 23.3 11.1 2.3 


Although these results are far superior to those of the Dickey—Fuller test, the power 
of the test for large values of a, is still disappointing. 

Section 6.3 of the Programming Manual uses real U.S. DGP to illustrate the appro- 
priate use of the test. 


11. PANEL UNIT ROOT TESTS 


Section 6 presented some strong evidence that the three real exchange rate series shown 
in Figure 4.7 are unit root processes. Of course, it is possible that the series are mean 
reverting but the Dickey—Fuller tests have little power to detect the fact that the series 
are stationary. One way to obtain a more powerful test is to pool the estimates from a 
number of separate series and then test the pooled value. The theory underlying the test 
is very simple: if you have n independent and unbiased estimates of a parameter, the 
mean of the estimates is also unbiased. More importantly, so long as the estimates are 
independent, the central limit theory suggests that the sample mean will be normally 
distributed around the true mean. 

Im, Pesaran, and Shin (2002) show how to use this result to construct a test for a 
unit root when you have a number of similar time-series variables (i.e., a panel). The 
only complicating factor is that the OLS estimates for y in the Dickey—Fuller test are 
biased downward. Suppose you have n series each containing T observations. For each 
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series, perform an ADF test of the form 
Pi 
Ayia = Gig + ViVi + apt + DY ByAYng ten f= 1,0 (4.45) 
jel 
Because the lag lengths can differ across equations, you should perform separate 
lag length tests for each equation. Moreover, you may choose to exclude the determin- 
istic time trend. However, if the trend is included in one equation, it should be included 
in all. Once you have estimated the various y,;, obtain the t-statistic for the null hypoth- 
esis y; = 0. In a traditional Dickey—Fuller test, each of these f-statistics —denoted by 
t; —would be compared to the appropriate critical value reported in Table A. However, 
for the panel unit root test, form the sample mean of the f-statistics as 


i= (1/n) \ t; (4.46) 
i=l 
It is straightforward to construct the statistic Z; as 
vnli — E@)] 
vyvar) 


where Et and var(f) denote the theoretical mean and variance of t. If the OLS estimates 
of the individual t; were unbiased, the value of Et would be zero. However, to correct 
for the bias, the values Ef and var(f) can be calculated by Monte Carlo simulation. Im, 
Pesaran, and Shin (IPS) report these values as follows: 


NN 


T 6 8 10 15 20 50 100 500 


Et =152, =1.50 =1.50 =1.51 =1.52 =1:53 =1.53. =1.53 
var(t) 1.75 1.23 1.07 0.92 085 0.76 0.74 0.72 


Im, Pesaran, and Shin show that Z; has an asymptotic standardized normal distri- 
bution. Hence, for large T and n, you can approximate Z; with a normal distribution. 
This fact should not be too surprising. If each of the estimated values of the various t; 
are independent, the central limit theorem indicates that deviation of the sample aver- 
age from the true mean will have a normal distribution. Rejecting the null hypothesis 
Z; = 0 is equivalent to accepting the alternative hypothesis that at least one value of the 
y; differs from zero. After all, if the sample average of the f-statistics is significantly 
different from zero, at least one of the values of y; is statistically different from zero. 

The proof that Z; has a normal distribution relies on very large samples. For the 
sample sizes typically used by applied econometricians, it is preferable to use the crit- 
ical values contained in Table 4.7. Notice that the critical values depend on n, T, and 
whether a time trend is included in (4.45). For example, if you have seven series each 
containing 50 observations and you include a time trend in (4.45), the 5% critical value 
for t is —2.67. If you had used the Dickey—Fuller test, the corresponding critical value 
for each of the seven values of t; would be —3.50 (see Table A). Note that it is necessary 
to have values of T and n, which are greater than four. Large values of T are standard 
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Table 4.7 Selected Critical Values for the IPS Panel Unit Root Test 


25 50 70 
n/T 10% 5% 1% 10% 5% 1% 10% 5% 1% 


No Time Trend 


5 —2.04 —2.18 —2.46 —2.02 —2.15 —2.42 —2.02 —2.15 —2.40 
7 —1.95 —2.08 —2.32 —1.95 —2.06 —2.28 —1.95 —2.06 —2.28 
10 —1.88 —1.99 —2.19 —1.88 —1.98 —2.16 —1.88 —1.98 —2.16 
15 —1.82 —1.90 —2.07 —1.81 —1.89 —2.05 —1.81 —1.89 —2.04 
25 —1.75 —1.82 —1.94 —1.75 —1.81 —1.93 —1.75 —1.81 —1.93 
50 —1.69 —1.73 —1.82 —1.68 —1.73 —1.81 —1.64 —1.67 —1.73 
Time Trend 
5 —2.65 —2.80 -3.09 —2.62 —2.76 —3.02 —2.62 —2.75 —3.00 
7 —2.58 —2.70 —2.94 —2.56 —2.67 —2.88 —2.55 —2.66 —2.67 
10 —2.51 —2.62 —2.82 —2.50 —2.59 —2.77 —2.49 —2.58 —2.75 
15 —2.45 —2.53 —2.69 —2.44 —2.52 —2.65 —2.44 —2.51 —2.65 
25 —2.39 —2.45 —2.58 —2.38 —2.44 —2.55 —2.38 —2.44 —2.54 
50 —2.33 —2.37 —2.45 —2.32 —2.36 —2.44 —2.32 —2.36 —2.44 


in time-series econometrics. However, if n is too small, the calculation of t will not be 
meaningful. 

As mentioned in Section 6, the file PANEL.XLS contains quarterly values 
of the real effective exchange rates (CPI based) for Australia, Canada, France, 
Germany, Japan, the Netherlands, the United Kingdom, and the United States over 
the 198001-201301 period. Since PPP does not allow for a deterministic time trend, 
each was estimated in the form of (4.45) but without the trend. The results of the 
individual Dickey -Fuller tests for the logarithmic values of the real rates are shown in 
the first four columns of Table 4.8. For example, the Australian equation used five lags 
of {Ay;,} and the estimated value of y; was —0.049. Notice that the eight f-statistics 
for the null hypothesis y; = 0 have an average value of —2.44. Since each series has a 
total of 133 observations, the critical values at the 5% and 1% significance levels are 


Table 4.8 The Panel Unit Root Tests for Real Exchange Rates 


Lags Estimated y; t-Statistic Estimated y; t-Statistic 
Log of the Real Rate Minus the Common Time Effect 
Australia 5 —0.049 —1.678 —0.043 —1.434 
Canada 7 —0.036 —1.896 —0.035 —1.820 
France 1 —0.079 —2.999 —0.102 —3.433 
Germany 1 —0.068 —2.669 —0.067 —2.669 
Japan 3 —0.054 —2.277 —0.048 —2.137 
The Netherlands 1 —0.110 —3.473 —0.137 —3.953 
The United Kingdom 1 —0.081 —2.759 —0.069 —2.504 
The United States 1 —0.037 —1.764 —0.045 —2.008 


246 CHAPTER4 MODELS WITH TREND 


about —2.06 and —2.28, respectively. Hence, it is possible to reject the null hypothesis 
that all values of y; = 0. 

One problem with the results is that the residuals from the individual equation are 
contemporaneously correlated in that Ee ;,€;, # 0. For example, the correlation coeffi- 
cient between the residuals from the French and German equations is 0.67. The expla- 
nation is that the shocks that affect the French real rate are likely to affect the German 
real rate. In this circumstance, a common strategy is to subtract a common time effect 
from each observation. At time period ¢, the mean value of each series is 


y= (/yD) yi 
i=l 


The method is to subtract this common mean from each observation (i.e., form 
Y; = Yi — Yp) and estimate (4.45) using the values of y7. In the example at hand, y;, is 
the logarithm of real rate i at period t, hence, for each time period f, the average of these 
logarithmic values was subtracted from y;. The last three columns of Table 4.8 show 
the test results for the {y} } sequences. Notice that the lag lengths have not changed, 
but the average value of the t-statistics is —2.50. As such, it is possible to reject the null 
hypothesis that the real rates are not stationary. 


Limitations of the Panel Unit Root Test 


1. The null hypothesis for the IPS test is y; = y, =--- = 7, = 0. Rejection of 
the null hypothesis means that at least one of the y;’s differs from zero. Thus, 
it is possible for only one or two values of the y; to differ from zero and still 
reject the null hypothesis. Unfortunately, there is no particular way of know- 
ing which of the y, are statistically different from zero. As such, the results 
of a panel unit root test may be dependent on the choice of the time-series 
variables included in the panel. 


2. At this point, there is substantial disagreement about the asymptotic theory 
underlying the test. Sample size can approach infinity by increasing n for a 
given T, increasing T for a given n, or by simultaneously increasing n and T. 
Unfortunately, many of the important findings about the various tests are sen- 
sitive to this seemingly innocuous choice among the various assumptions. For 
example, the critical values reported in Table 4.7 are invariant to augmenting 
(4.45) with lagged changes for large T. However, for small T and large n, the 
critical values are dependent on the magnitudes of the various £;;. 

3. The test requires that the error terms from (4.45) be serially uncorrelated and 
contemporaneously uncorrelated. You need to determine the values of p; to 
ensure that the autocorrelations of {€;,} are zero. Nevertheless, the errors 
may be contemporaneously correlated in that Ee ;,€;, # 0. If the regression 
residuals are correlated across equations, the critical values in Table 4.7 are 
not applicable. The example above illustrates a common technique to cor- 
rect for correlation across equations. As in the example, you can subtract a 
common time effect from each observation. However, there is no assurance 
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that this correction will completely eliminate the correlation. Moreover, it is 
quite possible that {y,} is nonstationary. Subtracting a nonstationary compo- 
nent from each sequence is clearly at odds with the notion that the variables 

are stationary. As an alternative, many researchers would generate their own 
critical values by bootstrapping the value of t. Some of the details regarding 

bootstrapping are described in Section 4.3 of the Supplementary Manual. 


There are a number of other panel unit root tests in the literature. The Maddala— Wu 
(1999) test is similar to the IPS test but requires that you bootstrap your own critical 
values. The Levin—Lin—Chu (2002) test has the more restrictive alternative hypothesis 
Yı =% =<: = y, Nevertheless, the cautions listed above are applicable to all of the 
panel unit root tests. An interesting comparison of the tests can be found in the August 
2001 issue of the Journal of Money Credit and Banking. Three different articles perform 
various panel unit roots for a number of real exchange rate series. 


12. TRENDS AND UNIVARIATE 
DECOMPOSITIONS 


The findings of Nelson and Plosser (1982) suggest that many economic time series 
have a stochastic trend plus a stationary component. Having observed a series but not 
the individual components, is there any way to decompose the series into its constituent 
parts? Numerous economic theories suggest that it is important to distinguish between 
temporary and permanent movements in a series. A sale (i.e., a temporary price decline) 
is designed to induce us to purchase now rather than in the future. Labor economists 
argue that “hours supplied” is more responsive to a temporary wage increase than to 
a permanent increase. The idea is that workers will temporarily substitute income for 
leisure time. Certainly, modern theories of the consumption function that classify an 
individual’s income into permanent and transitory components highlight the impor- 
tance of such as decomposition. 

Any such decomposition is straightforward if it is known that the trend in {y,} is 
purely deterministic. For example, a linear time trend induces a fixed change in each 
and every period. This deterministic trend can be subtracted from the actual value of y, 
to obtain the stationary component. 

A difficult conceptual issue arises if the trend is stochastic. For example, suppose 
you are asked to measure the current phase of the business cycle. If the trend in GDP is 
stochastic, how is it possible to tell if GDP is above or below trend? The traditional mea- 
surement of a recession by consecutive quarterly declines in real GDP is not helpful. 
After all, if GDP has a deterministic trend component, a negative realization for the sta- 
tionary component may be outweighed by the positive deterministic trend component. 

If it is possible to decompose a sequence into its separate permanent and station- 
ary components, the issue can be solved. To better understand the nature of stochastic 
trends, note that—in contrast to a deterministic trend— a stochastic trend increases on 
average by a fixed amount each period. For example, consider the random walk plus 
drift model: 


Yi =V-1 t+ Ao +E; 
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Since Ee, = 0, the average change in y, is the deterministic constant ag. Of course, 
in any period ¢, the actual change will differ from ay by the stochastic quantity €,. Yet, 
each sequential change in {y,} adds to its level regardless of whether the change results 
from the deterministic or the stochastic component. As we saw in (4.5), the random 
walk plus drift model has no stationary component; hence, it is a model of pure trend. 

The idea that a random walk plus drift is a pure trend has proved especially use- 
ful in time-series analysis. Beveridge and Nelson (1981) show how to decompose any 
ARIMA(p, |, g) model into the sum of a random walk plus drift and a stationary com- 
ponent (i.e., the general trend plus irregular model). Before considering the general 
case, begin with the simple example of an ARIMA(O, 1, 2) model: 


Yt = Yi-1 + dy + E; + P1Er1 + Po€s-2 (4.47) 


If 6, = pa = 0, (4.47) is nothing more than the pure random walk plus drift model. 
The introduction of the two moving average terms adds a stationary component to the 
{y,} sequence. The first step in understanding the procedure of Beveridge and Nelson 
(1981) is to obtain the forecast function. For now, keep the issue simple by defining e, = 
E, + Bi E;_1 + Bo€;_2 So that we can write y, = y,- + dp + e,. Given an initial condition 
for yo, the general solution for y, is 


t 


Y = aot +o + De; (4.48) 
i=1 


Updating by s periods, we obtain 


t+s 


Vers = alt+s) + yo + De; (4.49) 
i=l 


Substituting (4.48) into (4.49) so as to eliminate yọ yields 


S 


Vins = AS + y, + 2, Cri (4.50) 


i=l 


To express the solution for y,,, in terms of {€,} rather than {e,}, note that 


S S S S 
Deni = Jeni +BY eit hR, E124: (4.51) 
i=l i=l =l i=l 


so that the solution for y,,, can be written as 


Eni + Ay, E-14i + by, E124 (4.52) 


1 


Yis = oS + Y; + 


S S 


i=l i=1 i 


Now consider the forecast of y,, , for various values of s. Since all values of E,e,,; = 
0 for i > 0, it follows that 


E Y1 = 40o +Y; + BE, + Bo€)-1 


E,Yi42 = 24o + y; + ($1 + Pde, + Prés-1 
E, Yis = Sao + yr + (By + Bodé; + Pr€s-1 (4.53) 
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Here, the forecasts for all s > 1 are equal to the expression sag + y, + (P1 + Bre, + 
P2£,-1- Thus, the forecast function converges to a linear function of the forecast hori- 
zon s; the slope of the function equals ay and the level equals y, + (P1 + Ba)E, + Pr€;_1- 
This stochastic level can be called the value of the stochastic trend at f and is denoted 
by y,. This trend plus the deterministic value ags constitutes the forecast E,y,,,. There 
are several interesting points to note: 


1. The trend is defined to be the conditional expectation of the limiting value of 
the forecast function. In lay terms, the trend is the “long-term” forecast. This 
forecast will differ at each period ¢ as additional realizations of {€,} become 
available. At any period f, the stationary component of the series is the dif- 
ference between y, and the trend u,. Hence, the stationary component of the 
series is 


Ye — Mi = —(B, + Bade, — Poé 1 (4.54) 


At any point in time such that y, is given, the trend and the stationary compo- 
nents are perfectly correlated (the correlation coefficient being — 1). 


2. By definition, g, is the innovation in y,, and the variance of the innovation is 
o°. Since the change in the trend resulting from a change in £, is 1 + B, + Bo, 
the variance of the innovation in the trend can exceed the variance of y, itself. 
If (1 + pi + p2) > 1, the trend is more volatile than y, since the negative 
correlation between py, and the stationary component act to smooth the {y,} 
sequence. 


3. The trend is a random walk plus drift. Since the trend at f is y,, it follows that 
Hi = Yi T (Pi + BE, + BoE}. Hence 


Au, = Ay, + (Pi + By) Ae, + PAE, 
= (Y; — Y1) + (By + Bade; — P1E-1 — Po€s-2 


since y; — Yy1 = dg + E, + Bye) + PaE 
Ap, = ag + (1 + p; + Ay)e; 


Thus, 4, = M;_; + ao + (1 + p, + fy)e;, So that the trend at t is composed of 
the drift term a, plus the white-noise innovation (1 + p} + fy)eé;. 


Beveridge and Nelson show how to recover the trend and stationary components 
from the data. In the example at hand, estimate the {y,} series using the Box—Jenkins 
technique. After differencing the data, an appropriately identified and estimated ARMA 
model will yield high-quality estimates of dp, f4, and f,. Next, obtain £, and €,_, as 
the one-step-ahead forecast errors of y, and y,_,, respectively. To obtain these values, 
use the estimated ARMA model to make in-sample forecasts of each observation of 
y,-1 and y,. The resulting forecast errors become g, and €,_,. Combining the estimated 
values of £1, P2, €,, and €,_, as in (4.54) yields the irregular component. Repeating for 
each value of t yields the entire irregular sequence. From (4.54), this irregular compo- 
nent is y, minus the value of the trend; hence, the permanent component can be obtained 
directly. 
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The General ARIMA(p, 1, q) Model 


The first difference of any ARIMA (p, 1, q) series has the stationary infinite-order mov- 
ing average representation: 


Yt — Men = ao + E; + By Ey + PoE_2 ++: 


As in the earlier example, it is useful to define e, = £, + By €,_, + B2E-2 + P3E1-3 + 
-++, SO that it is possible to write the solution for y,,, in the same form as (4.50) 


S 
Yi+s = Yr + Aos + > Etti 


i=l 


The next step is to express the {e,} sequence in terms of the various values of the 
{£,} sequence. In this general case, (4.51) becomes 


S S S S Ss 
2, ermi = > Eni + By 2 E1+i t h’, E1241 t B’, Ezi tH (4.55) 
i=l i=l i=l i=l i=l 


Since E,€,,; = 0, it follows that the forecast function can be written as 


s stl] st2 
E,Yias = Yi + Ags + (È n) Et (È a) E1 t (È a) Erg ters (4.56) 
i=2 i=3 


i=1 


Now, to find the stochastic trend, take the limiting value of the forecast E,(y,4, — 
ags) as s becomes infinitely large. As such, the stochastic trend is 


yr (È a) Et e a) E1 t (È a) 7 
i=l i=2 i=3 


The key to operationalizing the decomposition is to recognize that y,,, can be 
written as 


Vets = AVers + AV 5-1 + AV s—2 Fo FAV TY 


As such, the trend can always be written as the current value of y, plus the sum 
of all of the forecasted changes in the sequence. Abstracting from ags, the stochastic 
portion of the trend is 


lim E Yis = lim E lOs = Yits-1) + Orts- - Yirs-2) Tiet O42 Z Yr) 


soo 


HOn TYD] +y, 
= lim E,(Ay,45 t+ AYas-1 $+ + AY + AY 1) +9; (4.57) 


The useful feature of (4.57) is that the Box—Jenkins method allows you to calcu- 
late each value of E,Ay,,,. For each observation in your data set, find all s-step-ahead 
forecasts and construct the sum given by (4.57). Since the irregular component is y, 
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minus the sum of the deterministic and stochastic trends, the irregular component can 
be constructed as 


Y= dim (E Yms + aos) 


=> dim E,(AY;45 + Ary 5-1 $+ + AY + AY 41) — GS 


Thus, to use the technique of Beveridge and Nelson (1981): 


STEP 1: 


STEP 2: 


STEP 3: 


Estimate the first difference of the series using the Box—Jenkins technique. 
Select the best-fitting ARMA(p, q) model of the { Ay,} sequence. 

Using the best-fitting ARMA model, for each time period t = 1, ... T, find 
the one-step-ahead, two-step-ahead, ... , s-step-ahead forecasts: that is, 
find E,Ay,,, for each value of t and s. For each value of t, use these fore- 
casted values to construct the sums: E,[Ay,,, + Ay,,.-) He H Ayal +); 
In practice, it is necessary to find a reasonable approximation to (4.57); in 
their own work, Beveridge and Nelson let s = 100. For example, for the first 
usable observation (i.e., t = 1), find the sum: 


My = E (Ay101 + AY100 +++ + Aya) + 


The value of y, plus the sum of these forecasted changes equals Ey 19; the 
stochastic portion of trend in period 1 is E,y;9, — aos and the deterministic 
portion is ags. Similarly, for t = 2, construct 


Hy = Ey(Ayjo2 + Ayjo) +++ + + Ay3) + yo 


If there are T observations in your data set, the trend component for the last 
period is 
Hr = Er(AYr+100 + AYr+99 +: +: + AYr41) + Yr 


The entire sequence of constructed trends (i.e., 44, Hy, --- , Hr) constitutes 
the {u,} sequence. 

Form the irregular component at t by subtracting the stochastic portion of 
the trend at ¢ from the value of y,. Thus, for each observation f, the irregular 
component is —E,(AY,, 199 + AY;499 +++ > + AY). 

Note that, for many series, the value of s can be quite small. For 
example, in the ARIMA(0, 1, 2) model of (4.47), the value of s can be set 
equal to 2 since all forecasts for s > 2 are equal to zero. If the ARMA model 
that is estimated in Step 1 has slowly decaying autoregressive components, 
the value of s should be large enough that the s-step-ahead forecasts 
converge to the deterministic change ag. 


Two Examples: The file PANEL.XLS contains quarterly values of the real British 
pound estimated by the ARIMA(0, 1, 1) process: 


Ay, = —0.0004 + £, + 0.386¢,_; 
(-0.11) (4.75) 


where Ay, is the logarithmic change in the real British pound. 


252 CHAPTER4 MODELS WITH TREND 


Although it is often desirable to maintain an insignificant intercept term in a regres- 
sion, in this case, it is clearly undesirable since it imparts a deterministic trend into the 
real exchange rate. As such, reestimate the model without the intercept to obtain 


Ay, = £, + 0.386€,_| 


Step 2 requires that, for each observation, we form the one-step-ahead through 
s-step-ahead forecasts. For this model, the mechanics are trivial since, for each period t, 
the one-step-ahead forecast is 


E,Ay,, = 0.386€, 


and all other s-step-ahead forecasts are zero. 

Thus, for each observation f, the summation E,(Ay,, 199 + AY;4.99 + +++ + Ay,41) 18 
equal to 0.386¢,. As such, for 198002 (the first usable observation in the sample), the 
stochastic portion of the trend is yj9g992 + 0.386€ jog992 and the temporary portion of 
Y198002 İS —0.386€ 198002. Repeating for each point in the data set yields the irregular 
and permanent components of the sequence. The estimated ARIMA(0, 1, 1) model is 
the special case of (4.47) in which ay and p, are set equal to zero. As such, you should 
be able to write the equivalent of (4.48) through (4.54) for the real pound. 

We have verified that the real U.S. GDP is the unit root process 


Alrgdp, = 0.0078 + 0.3706AIrgdp,_, 


Now, it is more difficult to calculate the sum of the forecasted changes. 
Nevertheless, it is worthwhile to illustrate the process for the first few values. In 
194702, the value of Irgdp, was 7.4776 and the value of Alrgdp, was —0.00153. 
Since we are not interested in the deterministic portion of the trend, condi- 
tional on the information available in 1947Q2 the one-step-ahead forecast for 
194703 is —5.670 x 1074 = (0.3706)(—0.00153) and the two-step-ahead forecast 
is —2.101 x 1074 = (0.3706)(—5.670 x 1074). The forecasts quickly converge to 
zero after a few periods. Adding up all such forecasted changes, you should obtain 
—0.0009. Thus, abstracting from the deterministic portion of the trend, the log of real 
GDP is forecasted to change by —0.0009 in the very long run. Adding this sum to 
IrgdP 194792 Yields the stochastic component as 7.4476 — 0.0009 = 7.4467. If you take 
the antilogs, you find the actual level of real GDP in 1947Q2 to be $1768 billion and 
the permanent component to be $1714 billion. 

Repeating this process for all observations in the data set yields the time path of the 
trend component of real GDP. If you were to plot the trend along with the actual values, 
you would find that the two series virtually overlap. Since the autoregressive coefficient 
is so small, virtually all of the movements in the real GDP series are permanent. The 
cyclical component is plotted in Panel (a) of Figure 4.11. Note that the series seems 
to be jagged than what is normally deemed to be the business cycle. Nevertheless, the 
decomposed series does well in the early and late 1970s and during the financial crisis. 


The Hodrick-Prescott Decomposition 


Another method of decomposing a series into a trend and a stationary component 
has been developed by Hodrick and Prescott (1997). Suppose you observe the val- 
ues yı through yy and want to decompose the series into a trend {y,} and a stationary 
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component y, — ,. Consider the sum of squares 


T T-1 
LD 0 = Hd + ED [ees = u) = Ou = u) 
t=1 t=2 


where À is a constant and T is the number of usable observations. 

The problem is to select the { 4, } sequence so as to minimize this sum of squares. 
In the minimization problem, å is an arbitrary constant reflecting the “cost” or penalty 
of incorporating fluctuations into the trend. In applications with quarterly data, includ- 
ing Hodrick and Prescott (1984) and Farmer (1993), A is usually set equal to 1,600. 
Increasing the value of / acts to “smooth out” the trend. If A = 0, the sum of squares is 
minimized when y, = y,; the trend is equal to y, itself. As A — oo, the sum of squares is 
minimized when (4,41 — M) = (H; — H-1). AS such, as À > oo, the change in the trend 
is constant; the result is that there is a linear time trend. Intuitively, for large values 
of A, the Hodrick—Prescott (HP) decomposition forces the change in the trend (i.e., 
Au; — Ap;) to be as small as possible. This occurs when the trend is linear. 

The benefit of the Hodrick—Prescott decomposition is that it uses the same method 
to extract the trend from a set of variables. For example, many real business cycle mod- 
els indicate that all variables will have the same stochastic trend. A Beveridge—Nelson 
decomposition separately applied to each variable will not yield the same trend for 
each. Panel (b) of Figure 4.11 shows the relatively smooth cycle for the GDP series 
obtained from the HP filter. There is a problem in that the decomposition indicates 
that the economy was operating above trend in 2011 and 2012. Figure 4.12 shows the 
HP filter applied to real U.S. GDP, consumption, and investment. You can see that 
the smoothed lines (representing the trends extracted by the HP decomposition) are 
such that the permanent components of each series account for the majority of the 
variation. However, a word of warning is in order. Since the HP filter is a function 
that smoothes the trend, it has been shown to introduce spurious fluctuations into the 
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FIGURE 4.11 Two Decompositions of GDP 
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FIGURE 4.12 Real GDP, Consumption and Investment 


irregular component of a series. The filter forces the stochastic trend to be a smoothed 
version of (M41 — Hy) — (H; — H-1). AS such, the filter works best if the {y,} series is 
I(2), so that smoothing the second difference of the stochastic trend is appropriate. 

Note that other types of decompositions are possible. Section 4.5 of the Supple- 
mentary Manual examines an unobserved components decomposition of GDP into a 
trend and cycle. 


13. SUMMARY AND CONCLUSIONS 


The trend in a series can contain both stochastic and deterministic components. Dif- 
ferencing can remove a stochastic trend, and detrending can eliminate a deterministic 
trend. However, it is inappropriate to difference a trend-stationary series and to detrend 
a series containing a stochastic trend. The resultant irregular component of the series 
can be estimated using Box—Jenkins techniques. 

In contrast to traditional theory, the consensus view is that most macroeconomic 
time series contain a stochastic trend. In finite samples, the correlogram of a unit root 
process will decay slowly. As such, a slowly decaying ACF can be indicative of a unit 
root or a near unit root process. The issue is especially important since many economic 
time series appear to have a nonstationary component. When you encounter such a 
time series, do you detrend, do you first difference, or do you do nothing since the 
series might be stationary? 

Adherents of the Box—Jenkins methodology recommend differencing a nonsta- 
tionary variable or a variable with a near unit root. For very short-term forecasts, the 
form of the trend is nonessential. Differencing also reveals the pattern of the other 
autoregressive and moving average coefficients. However, as the forecast horizon 
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expands, the precise form of the trend becomes increasingly important. Stationarity 
implies the absence of a trend and long-run mean reversion. A deterministic trend 
implies steady increases (or decreases) into the infinite future. Forecasts of a series 
with a stochastic trend converge to a steady level. As illustrated by the distinction 
between real business cycles and the more traditional formulations, the nature of the 
trend may have important theoretical implications. 

The usual f-statistics and F-statistics are not applicable to determine whether or 
not a sequence has a unit root. Dickey and Fuller (1979, 1981) provide the appropriate 
test statistics to determine whether a series contains a unit root, a unit root plus drift, 
and/or a unit root plus drift plus a time trend. The tests can also be modified to account 
for seasonal unit roots. Structural breaks will bias the Dickey—Fuller test toward the 
nonrejection of a unit root. Perron (1989) shows how it is possible to incorporate a 
known structural change into the tests for unit roots. Caution needs to be exercised 
because it is always possible to argue that structural change has occurred; each year 
has something different about it than the previous year. 

All the aforementioned tests have very low power to distinguish between a unit root 
and a near unit root process. A trend stationary process can be arbitrarily well approx- 
imated by a unit root process, and a unit root process can be arbitrarily well approx- 
imated by a trend-stationary process. Moreover, the testing procedure is confounded 
by the presence of the deterministic regressors (i.e., the intercept and the deterministic 
trend). The testing regression is misspecified if it omits any of the deterministic regres- 
sors in the data-generating process. However, too many regressors reduce the power of 
the tests. DF-GLS detrending methods generally have much better power than the tradi- 
tional Dickey—Fuller tests. If a reasonable number of similar series are available (such 
as the real exchange rates from a number of countries), panel unit root tests can be used. 

The fact that macroeconomic variables are not mean reverting makes it difficult to 
calculate the trend and cyclical components of GDP and its subcomponents. After all, 
traditional detrending yields nothing like a stationary cyclical component when a series 
contains a stochastic trend. Several methods have been devised to decompose real GDP 
into its permanent and temporary components. The method by Beveridge and Nelson 
(1981) indicates that innovations in the stochastic trend account for a sizable proportion 
of the period-to-period movements. However, the Beveridge—Nelson decomposition 
is not unique in that it forces the correlation coefficient between innovations in the 
trend and irregular components to have a correlation coefficient of —1. In contrast, the 
Hodrick—Prescott filter smoothes the trend component of a series. In Chapter 5, you 
will be shown a multivariate technique that allows for a unique decomposition of a 
series into its temporary and permanent components. 


QUESTIONS AND EXERCISES 


1. Given an initial condition for yọ, find the solution for y,. Also find the s-step-ahead forecast 


Eis: 

a. y,=y,) +E, +0.5e 
b. y, = 1.ly_ı +£, 

& y,=y,,+1t+e, 

d. y, =y +t+E, 


256 CHAPTER4 MODELS WITH TREND 


e. y, =H, +n,+0.5n,_,, where u, = M1 +E, 

f. y, = u,+n,+0.5y,_,, where u, = 0.5 + u1 +E, 

g. Can you make the models of parts b and d stationary? 
h. Does model e have an ARIMA (p, 1, q) representation? 

2. Given the initial condition yo, find the general solution and the forecast function (i.e., E,y,,,) 

for the following variants of the trend plus irregular model: 

a. y, = 4, +v, where u, = u,_, + Ep v, = (1 + £,L)n,, and Ee, = 0 

b. y, = 4, + V, where u, = u,_; + €,, v, = (1 + B,L)n, and the correlation between £, and y, 
equals unity 

c. Find the ARIMA representation of each model. 

3. As indicated in the text, the ACF of a series with a unit root shows little tendency to decay. 
Nevertheless, it may difficult to detect a unit root in a series with a negative moving average. 
Consider the unit root process y, = y,_, + €, — 0.8€,_). 

a. Iterate backward from y, to solve for y, in terms of the {€,} series and the initial condition 
Yo: 


b. Use the method of undetermined coefficients to y, in terms of the {€,} series and the 
tl 


initial condition yọ. [Hint: The solution has the form: y, = > a;E,_; + Yol 


iti 


c. Use your answer to part a or b to derive the first few terms of the ACF. 

d. Explain how the negative MA term affects the shape of the ACF. In particular, explain 
how the series is “infinitely persistent” even though the coefficients of the ACF are far 
below unity. 

4. Use the data sets that come with this text to perform the following: 

a. The file PANEL.XLS contains the real exchange rates used to generate the results 
reported in Table 4.8. Verify the lag lengths, the values of y, and the f-statistics reported 
in the left-hand side of the table. 

b. Does the ERS test confirm the results you found in part a? 

c. The file ERSTEST.XLS contains the data used in Section 10. Reproduce the results 
reported in the text. 

d. The file QUARTERLY.XLS contains the MINSA series used to illustrate the test for 
seasonal unit roots. Make the appropriate data transformations and verify the results 
concerning seasonal unit roots presented in Section 7. 


5. The second column in the file BREAK.XLS contains the simulated data used in Section 8. 


a. Plot the data to see if you can recognize the effects of the structural break. 

b. Verify the results reported in Section 8. 

c. The third column in the file BREAK.XLS contains another simulated data series called 
{y2,} with a structural break at t = 51. Plot the series and compare your graph to those of 
Figures 4.10 and 4.11. 

d. Obtain the ACF and PACF of the {y2,} sequence and first difference of the sequence. Do 
the data appear to be difference stationary? 

e. If you perform a Dickey—Fuller test including a constant and a trend, you should obtain 


y2, = 0.072 — 1.1014*10-4¢ — 0.022y2, 
(1.01) (—0.05) (—0.66) 


In addition to the fact that all t-statistics are small, in what other ways is this regression 
inadequate? What diagnostic checks would you want to perform? 

f. Estimate the equation: y2, = a) + a,f + uD, and save the residuals. Perform a 
Dickey—Fuller test on the saved residuals. Perform the appropriate diagnostic tests 
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on this regression to ensure that the residuals approximate white noise. You should 
conclude that the series is a unit root process with a one-time pulse at t = 51. 
g. Reestimate the model without the insignificant time trend. How is your answer affected? 


6. The file RGDP.XLS contains the real GDP data that were used to estimate (4.29). 


a. Use the series to replicate the results in Section 8. 

b. It is often argued that the oil price shock in 1973 reduced the trend growth rate of real 
U.S. GDP. Perform the Perron test to determine whether the series is trend stationary 
with a break occurring in mid-1973. 

c. Decompose the real GDP series into the temporary and permanent components using 
the HP filter and the Beveridge—Nelson decomposition. Plot the transitory component 
that you obtain from the HP filter and the one you obtain from the Beveridge—Nelson 
decomposition. In what ways are the two series different? 

d. Suppose that real GDP is trend stationary with a break occurring in mid-1973. Let the 
deviations from trend constitute the transitory component of the series. How does this 
transitory component compare with your answers found in part c? 


7. The file PANEL.XLS contains the real exchange rate series used to perform the panel unit 
root tests reported in Section 11. 


a. Replicate the results of Section 11. 

b. In what way do the results of the test change if Australia, France, Germany, and the 
United States are excluded from the panel? Why is it inappropriate to include or include 
countries based on their f-statistics? 

c. Suppose that you mistakenly included a time trend in the augmented Dickey—Fuller 
tests. Determine how the results reported in Section 11 change. 

8. The file QUARTERLY.XLS contains the U.S. interest rate data used in Section 10 of 

Chapter 2. Form the spread, s,, by subtracting the t-bill rate from the 5-year rate. Recall that 

the spread appeared to be quite persistent in that p} = 0.86 and p, = 0.68. 


a. One difficulty in performing a unit root test is to select the proper lag length. Using a 
maximum of 12 lags, estimate models of the form As, = ay + ys,_; + Xf;As,_;. Use the 
AIC, BIC, and general-to-specific (GTS) methods to select the appropriate lag length. 
You should find that the AIC, SBC, and GTS methods select lag lengths of 9, 1, and 8, 
respectively. In this case, does the lag length matter for the Dickey—Fuller test? 

b. Use a lag length of 8 and perform an augmented Dickey—Fuller test of the spread. You 
should find 


As, = 0.255 — 0.2115, , + EB,As,_ 
(8.78) (—4.37) 


i 


Is the spread stationary? 

c. Perform an augmented Dickey—Fuller test of the 5-year rate using seven lags. Is the 
5-year rate stationary? 

d. Perform an augmented Dickey—Fuller test of the t-bill rate using 11 lags. Is the t-bill rate 
stationary? 

e. How is it possible that the individual rates act as /(1) processes whereas the spread acts 
as a Stationary process? 

9. The file QUARTERLY.XLS contains the index of industrial production, the money supply 

as measured by M1, and the unemployment rate over the 196001-201204 period. 

a. Show that the results using this data set verify the finding of Dickey and Fuller (1981) 
that industrial production (INDPROD) is /(1). Use the log of INDPROD and select the 
lag length using the general-to-specific method. 
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b. Perform an augmented Dickey—Fuller test on the unemployment rate (UNEMP). If you 
use eight lagged changes you will find 


Aunemp, = 0.181 — 0.029unemp,_, + 2f,Aunemp, ; 
(2.30) (—2.25) 


Note that the t-statistic on J, is —2.65. 
c. Now estimate the unemployment rate using only 1-lagged change. You should find 


Aunemp, = 0.226 — 0.037unemp,_, + 0.683Aunemp,_; 
(3.36) (—3.43) (13.36) 


The residuals show only mild evidence of serial correlation. Consider 


Pi P2 P3 P4 P5 P6 Py Ps 
0.01 —0.01 0.08 —0.10 —0.10 0.11 0.14 —0.17 


What do you conclude about the stationarity of the unemployment rate? 
d. Regress INDPROD on MINSA. You should obtain 


INDPROD, = 30.48 + 0.04MINSA, 
(29.90) (36.58) 


Examine the ACF of the residuals. Also create a scatter plot of INDPROD, against 
MINSA,. How do you interpret the fact that R? = 0.98 and that the t-statistic on the 
money supply is 36.58? 

10. Use the data in the file QUARTERLY.XLS to perform the following: 

a. Perform the DF-GLS test using | lagged change of the log of INDPROD. You should 
find that the coefficient on y is —2.04. (Be sure to include a time trend.) 

b. Perform the DF-GLS test using eight lags of the change in UNEMP. You should find that 
the coefficient on y is —1.83. 

c. The SBC indicates that only one lagged change of UNEMP is appropriate. Now perform 
the DF-GLS test using 1-lagged change of UNEMP. In what important sense is your 
answer quite different from that found in part b? 

11. Chapter 6 of the Programming Manual analyzes the real GDP data in the file QUAR- 

TERLY(2012).XLS. Unlike the real GDP data used in the text, the date in this file begin in 

196001. Perform parts a through e below using this shorter data set. 


a. Form the log of real GDP as ly, = log(RGDP). Detrend the data with a linear time trend 
and form the autocorrelations. 

b. Perform an augmented Dickey—Fuller test to determine whether the series is stationary. 
You should find that the sample value of the 7, statistic is —2.16. Interpret the finding that 
the #,-statistic is 6.34. 

c. Verify the result the difference between potential and real GDP is stationary. 

d. Perform the DF-GLS test on the real and the potential GDP series. 

e. Compare the trends obtained from the HP filter and the Beveridge—Nelson decomposi- 
tion to the values of potential GDP. 

f. The Programming Manual applies the tests by Zivot and Andrews (1992) and Lee 
and Strazicich (2003) to the ly, = log(RGDP) series using data beginning in 196001. 
Perform the test on the longer series contained in the file REAL.XLS. 


CHAPTER 5 


MULTIEQUATION TIME-SERIES 
MODELS 


Learning Objectives 


1. 
2. 


14. 


Introduce intervention analysis and transfer function analysis. 


Show that transfer function analysis can be a very effective tool for 
forecasting and hypothesis testing when it is known that there is no feedback 
from the dependent to the so-called independent variable. 


Use data involving terrorism and tourism in Italy to explain the appropriate 
way to estimate an autoregressive distributed lag (ADL). 


Explain why the major limitation of transfer function and ADL models is 
that many economic systems do exhibit feedback. 


Introduce the concept of a vector autoregression (VAR). 


Show how to estimate a VAR. Explain why a structural VAR is not identified 
from a VAR in standard form. 


Show how to obtain impulse response and variance decompositions. 
Explain how to test for lag lengths, Granger causality, and exogeneity in a 
VAR. 

Illustrate the process of estimating a VAR and for obtaining the impulse 
responses using transnational and domestic terrorism data. 

Develop two new techniques, structural VARs and multivariate decomposi- 
tions, which blend economic theory and multiple time-series analysis. 
Illustrate several types of restrictions that can be used to identify a structural 
VAR. 

Show how to test overidentifying restrictions. The method is illustrated 
using both macroeconomic and agricultural data. 

Explain how the Blanchard—Quah restriction of long-run neutrality can be 
used to identify a VAR. 

The Blanchard—Quah decomposition is illustrated using real and nominal 
exchange rates. 


As we have seen in previous chapters, you can capture many interesting dynamic 
relationships using single equation time-series methods. In the recent past, many 
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time-series texts would end with nothing more than a brief discussion of multiequation 
models. Yet, one of the most fertile areas of contemporary time-series research 
concerns multiequation models. 

An interesting example concerns the relationship between domestic and transna- 
tional terrorism. Although the events of September 11, 2001, brought terrorism to the 
world’s attention, the international community experienced a sharp increase in transna- 
tional terrorism beginning in the late 1960s. Terrorists engage in a wide variety of oper- 
ations, including assassinations, armed attacks, bombings, kidnappings, and skyjack- 
ings. Such incidents are particularly heinous because they are often directed at innocent 
victims who are not part of the decision-making apparatus that the terrorists seek to 
influence. Figure 5.1 shows the quarterly totals of the number of domestic and transna- 
tional terrorist incidents with at least one casualty that have occurred since 1970Q1 
(excluding events in Iraq and Afghanistan). In a domestic incident, the nationalities of 
the victims and perpetrators are the same as the scene of the incident. Although the num- 
ber of domestic incidents far exceeds the number of transnational incidents, it appears 
that the two series bear a resemblance to each other. Both series seem to rise through- 
out the 1970s and decline around the time of the demise of the Soviet Union. Unlike 
univariate analysis, multivariate techniques allow us to formally analyze the interrela- 
tionships between the two series. You can examine the two series by opening the file 
TERRORISM.XLS. 


Panel (a): Domestic Incidents 


Incidents per quarter 
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Panel (b): Transnational Incidents 


Incidents per quarter 


T E E E SL SL LN ET E D EE E E T ET SE T 
1970 1973 1976 1979 1982 1985 1988 1991 1994 1997 2000 2003 2006 2009 


FIGURE 5.1 Domestic and Transnational Terrorism 
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1. INTERVENTION ANALYSIS 


The skyjackings on September 2011 and the skyjacking of Pan Am flight 103 over 
Lockerbie, Scotland, on December 21, 1988, captured the attention of the international 
community. However, skyjacking incidents have actually been quite numerous. The 
United States launched a critical response to the rise in skyjackings when it began to 
install metal detectors in all U.S. airports in January 1973. Other international author- 
ities soon followed suit. 

The quarterly totals of all transnational and U.S. domestic skyjackings are shown 
in Figure 5.2. Although the number of skyjacking incidents appears to take a sizable and 
permanent decline at this date, we might be interested in actually measuring the effects 
of installing the metal detectors. If {,} represents the quarterly total of skyjackings, one 
might try to take the mean value of {y,} for all t < 1973Q1 and compare it to the mean 
value of {y,} for all t > 197301. However, such a test is poorly designed because suc- 
cessive values of y, are serially correlated. As such, some of the effects of the premetal 
detector regime could “carry over” to the postintervention date. For example, some 
planned skyjacking incidents already in the pipeline might not be deterred as readily 
as others. 

Intervention analysis allows for a formal test of a change in the mean of a time 
series. Consider the model used in Enders, Sandler, and Cauley (1990) to study the 
impact of metal detector technology on the number of skyjacking incidents: 


Y; = Ag tay, Cort En |ay| <1 (5.1) 


Incidents per quarter 


| | | | | | | | | | | | 5 | | | 
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FIGURE 5.2 Skyjackings 
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where z, is the intervention (or dummy) variable that takes on the value of zero prior to 
1973Q1 and unity beginning in 1973Q1 and where g, is a white-noise disturbance. In 
terms of the notation in Chapter 4, z, is the level shift dummy variable Dz. 

To explain the nature of the model, notice that, for t < 1973Q1, the value z, 
is zero. As such, the intercept term is ad) and the long-run mean of the series is 
ay/(1 — a,). Beginning in 1973, the intercept term jumps to dg + Cp (since 2197301 
jumps to unity). Thus, the initial or impact effect of the metal detectors is given by the 
magnitude of cy. The statistical significance of cg can be tested using a standard t-test. 
We would conclude that metal detectors reduced the number of skyjacking incidents 
if cg is negative and statistically different from zero. 

The long-run effect of the intervention is given by co/(1 — a,), which is equal 
to the new long-run mean (dj + cọ)/(1 — a,) minus the value of the original mean 
dg/(1 — a,). The various transitional effects can be obtained from the impulse response 
function. Using lag operators, rewrite (5.1) as 


(1 — a, L)y, = dg + Cozi + €; 
so that 


Y: = Ay /(L = 41) + co Yai È, Eri (5.2) 
i=0 i=0 


Equation (5.2) yields the impulse response function; the interesting twist added by 
the intervention variable is that we can obtain the responses of the {y,} sequence to the 
intervention. To trace out the effects of metal detectors on skyjackings, suppose that 
t = 1973Q1 (so that f+ 1 = 197302, t+ 2 = 197303, etc.). For time period t, the 
impact of z, on y, is given by the magnitude of the coefficient cy. The simplest way 
to derive the remaining impulse responses is to recognize that (i) 0y,/0z,_; = OY,4;/02, 
and (ii) Z; = z; = 1 for alli > 0. 

Hence, partially differentiate (5.2) with respect to z,_; and update by one period 
so that 


OY 141/02, = Co + Cody 


The presence of the term C9 reflects the direct impact of z,,, on y,,,, and the second 
term cya, reflects the effect of z, on y, (= cy) multiplied by the effect of y, on y,, | (= a). 
Continuing in this fashion, we can trace out the entire impulse response function as 


Oy,4;/0%, = Coll +a, +-+ + (aI 


Since 74) =Zy2=.. =. 

Taking limits as j > oo, we can reaffirm that the long-run impact is given by 
co/(1 — a,). If it is assumed that 0 < a, < 1, the absolute value of the magnitude of 
the impacts is an increasing function of j. As we move further away from the date on 
which the policy was introduced, the absolute value of the magnitude of the policy 
response becomes greater. If —1 < a, < 0, the policy has a damped oscillating effect 
on the {y,} sequence. After the initial jump of cp, the successive values of {y,} oscillate 
toward the long-run level of cg/(1 — a). 
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There are several important extensions to the intervention example provided here. 
Of course, the model need not be a first-order autoregressive process. A more general 
ARMA(p, q) intervention model has the form 


Yı = Ao + A(L)y,_1 + Cox + BLE, 


where A(L) and B(L) are polynomials in the lag operator L. 

Also, the intervention need not be the pure jump illustrated in Panel (a) of 
Figure 5.3. In our study, the value of the intervention sequence jumps from zero 
to unity in 1973Q1. However, there are several other possible ways to model the 
intervention function: 


1. Pulse function. As shown in Panel (b) of the figure, the function z, is zero 
for all periods, except in one particular period in which z, is unity. This pulse 
function best characterizes a purely temporary intervention. Of course, the 
effects of the single impulse may last many periods due to the autoregressive 
nature of the {y,} series. 

2. Gradually changing function. An intervention may not reach its full force 
immediately. Although the United States began installing metal detectors 
in airports in January 1973, it took almost a full year for installations to be 
completed at some major international airports. Our intervention study of the 
impact of metal detectors on quarterly skyjackings also modeled the z, series 
as 1/4 in 1973Q1, 1/2 in 197302, 3/4 in 197303, and 1.0 in 1973Q4 and all 

Panel (a): Pure jump Panel (b): Pulse 
1.25 1.25 
1.00 |- 1.00 — 
0.75 | 0.75 |= 
0.50 |- 0.50 = 
0.25 = 0.25 | 
0.00 0.00 
123 45 67 8 9 10 123 45 67 8 9 10 
Panel (c): Gradually changin Panel (d): Prolonged pulse 
1.25 ud au 1.25 oe? 
1.00 |- 1.00 = 
0.75 j- 0.75 = 
0.50 - 0.50 — 
0.25 — 0.25 | 
0.00 0.00 
12 3 45 67 8 9 10 123 45 67 8 9 10 


FIGURE 5.3 Typical Intervention Functions 
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subsequent periods. This type of intervention function is shown in Panel (c) 
of the figure. 

3. Prolonged impulse function. Rather than a single pulse, the intervention may 
remain in place for one or more periods and then begin to decay. For a short 
time, sky marshals were put on many U.S. flights to deter skyjackings. Since 
the sky marshal program was allowed to terminate, the {z,} sequence for sky 
marshals might be represented by the decaying function shown in Panel (d) of 
Figure 5.3. 


Be aware that the effects of these interventions change if {y,} has a unit root. From 
the discussion of Perron (1989) in Chapter 4, you should recall that a pulse intervention 
will have a permanent effect on the level of a unit root process. Similarly, if {y,} has a 
unit root, a pure jump intervention will act as a drift term. As indicated in Question 1 
at the end of this chapter, an intervention will have a temporary effect on a unit root 
process if all values of {z,} sum to zero (e.g., z, = 1, z1 = —0.5, Z4. = —0.5, and all 
other values of the intervention variable equal zero). 

Also be aware that the intervention may affect the variable of interest with a delay. 
Suppose that it takes d periods for z, to begin to have any effect on the series of interest. 
It is possible to capture this behavior with a model of the form 


Yı = Ay + AC)y,-1 + Cozia + BLE; 


Often, the shape of the intervention function and the delay factor d are clear from 
a priori reasoning. When there is an ambiguity, estimate the plausible alternatives and 
then use the standard Box—Jenkins model selection criteria to choose the most appro- 
priate model. The following two examples illustrate the general estimation procedure. 


Estimating the Effect of Metal Detectors 
on Skyjackings 


The linear form of the intervention model y, = ay + A(L)y,_1 + Coz; + B(L)e, assumes 
that the coefficients are invariant to the intervention. A useful check of this assumption 
is to pretest the data by estimating the most appropriate ARIMA(p, d, q) models for 
both the pre- and postintervention periods. If the two ARIMA models are quite differ- 
ent, it is likely that the autoregressive and moving average coefficients have changed. 
Usually, there are not enough pre- and postintervention observations to estimate two 
separate models. In such instances, the researcher must be content to proceed using the 
best-fitting ARIMA model over the longest data span. The procedure described below 
is typical of most intervention studies. 


STEP 1: Use the longest data span (1.e., either the pre- or the postintervention obser- 
vations) to find a plausible set of ARIMA models. 
You should be careful to ensure that the {y,} sequence is stationary. If 
you suspect nonstationarity, you can perform unit root tests on the longest 
span of data. Alternatively, you can use the Perron (1989) test for structural 
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change discussed in Chapter 4. In the presence of d unit roots, estimate the 
intervention model using the dth difference of y, i.e., A“y,). 

In our study, we were interested in the effects of metal detectors on U.S. 
domestic skyjackings, transnational skyjackings (including those involv- 
ing the United States), and all other skyjackings. Call each of these time 
series {DS,}, {TS,}, and {OS,}, respectively. Since there are only 5 years 
of data (i.e., 20 observations) for the preintervention period, we estimated 
the best-fitting ARIMA model over the 1973Q1-—198804 period. Using the 
various criteria discussed in Chapter 2 (including diagnostic checks of the 
residuals), we selected an AR(1) model for the {TS,} and {OS,} sequences 
and a pure noise model (i.e., all autoregressive and moving average coeffi- 
cients equal to zero) for the {DS,} sequence. 

STEP 2: Estimate the various models over the entire sample period, including the 
effect of the intervention. 

The installation of metal detectors was tentatively viewed as an imme- 
diate and permanent intervention. As such, we set z, = 0 fort < 1973Q1 and 
Z; = 1 beginning in 1973Q1. The results of the estimations over the entire 
sample period are reported in Table 5.1. As you can see, the installation of 
metal detectors reduced each of the three types of skyjacking incidents. The 
most pronounced effect was that U.S. domestic skyjackings immediately fell 
by more than 5.6 incidents per quarter. All effects are immediate because 
the estimate of a, is zero. The situation is somewhat different for the {7S,} 
and {OS,} sequences because the estimated autoregressive coefficients are 
different from zero. On impact, transnational skyjackings and other types 
of skyjacking incidents fell by 1.29 and 3.9 incidents per quarter, respec- 
tively. The long-run effects on {TS,} and {OS,} are estimated to be —1.78 
and —5.11 incidents per quarter, respectively. 


STEP 3: Perform diagnostic checks of the estimated equations. 


Table 5.1 Metal Detectors and Skyjackings 


Pre-Intervention Impact Long-Run 

Mean a, Effect (cy) Effect 

Transnational {TS,} 3.032 0.276 -1.29 —1.78 
(5.96) (2.51) (-2.21) 

US domestic {DS,} 6.70 —5.62 —5.62 
(12.02) (-8.73) 

Other skyjackings {OS,} 6.80 0.237 —3.90 —5.11 
(7.93) (2.14) (—3.95) 


Notes: 
1t-Statistics are in parentheses. 


The long-run effect is calculated as O/(1 — a,)- 
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Diagnostic checking in Step 3 is particularly important since we have merged the 
observations from the pre- and postintervention periods. To reiterate the discussion of 
ARIMA models, a well-estimated intervention model will have the following charac- 
teristics: 


1. The estimated coefficients should be of “high quality.” All coefficients should 
be statistically significant at conventional levels. As in all ARIMA modeling, 
we wish to use a parsimonious model. If any coefficient is not significant, an 
alternative model should be considered. Moreover, the autoregressive coeffi- 
cients should imply that the {y,} sequence is convergent. 

2. The residuals should approximate white noise. If the residuals are serially 
correlated, the estimated model does not mimic the actual data-generating 
process. Forecasts from the estimated model cannot possibly make use of 
all available information. If the residuals do not approximate a normal dis- 
tribution, the usual tests of statistical inference are not appropriate in small 
samples. If the errors appear to be ARCH, the entire intervention model can 
be reestimated as an ARCH process. 

3. The tentative model should outperform plausible alternatives. Of course, 
no one model can be expected to dominate all others in all possible criteria. 
However, it is good practice to compare the results of the maintained model 
to those of reasonable rivals. In the skyjacking example, a plausible alterna- 
tive was to model the intervention as a gradually increasing process. This is 
particularly true because the impact effect was immediate for U.S. domestic 
flights and convergent for transnational and other domestic flights. Our con- 
jecture was that metal detectors were gradually installed in non-US. airports 
and, even when installed, the enforcement was sporadic. As a check, we mod- 
eled the intervention as gradually increasing over the year 1973. Although the 
coefficients were nearly identical to those reported in Table 5.1 for the TS, 
and OS, series, the AIC and SBC were slightly lower (indicating a better fit) 
using the gradually increasing process. Hence, it is reasonable to conclude 
that metal detector adoption was more gradual outside of the United States. 


Estimating the Effect of the Libyan Bombing 


We also considered the effects of the U.S. bombing of Libya on the morning of April 
15, 1986. The stated reason for the attack was Libya’s alleged involvement in the ter- 
rorist bombing of the La Belle Discotheque in West Berlin. Since 18 of the F-111 
fighter-bombers were deployed from British bases at Lakenheath and Upper Heyford, 
England, the United Kingdom implicitly assisted in the raid. The remaining U.S. planes 
were deployed from aircraft carriers in the Mediterranean Sea. Now, let y, denote 
all transnational terrorist incidents directed against the United States and the United 
Kingdom during month ¢. A plot of the {y,} sequence exhibits a large positive spike 
immediately after the bombing; the immediate effect seemed to be a wave of anti-U.S. 
and anti-U.K. attacks to protest the retaliatory strike. You can see this spike in each of 
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the two series shown in Figure 5.1. The spikes would be even more pronounced if only 
attacks against the United States and United Kingdom were shown. 

Preliminary estimates of the monthly data from January 1968 to March 1986 indi- 
cated that the {y,} sequence could be estimated as a purely autoregressive model with 
significant coefficients at lags 1 and 5. We were surprised by a significant coefficient 
at lag 5, but the AIC and SBC both indicated that the fifth lag is important. Neverthe- 
less, we estimated versions of the model with and without the fifth lag. In addition, 
we considered two possible patterns for the intervention series. For the first, {z,} was 
modeled as 0 until April 1986 and 1 in all subsequent periods. Using this specification, 
we obtained the following estimates (with f-statistics in parentheses): 


y, = 5.58 + 0.336y,_, + 0.123y,_5 + 2.65z, 
(5.56) (3.26) (0.84) 
AIC = 1656.3 SBC = 1669.95 


Note that the coefficient of z, has a t-statistic of 0.84 (which is not significant at the 
0.05 level). Alternatively, when z, was allowed to be 1 only in the month of the attack, 
we obtained 


y, = 3.79 + 0.327y,_; + 0.157y,_5 + 38.9z, 
(5.53) (2.59) (6.09) 
AIC = 1608.68 SBC = 1626.06 


In comparing the two estimates, it is clear that magnitudes of the autoregressive 
coefficients are similar. Although Q-tests indicated that the residuals from both mod- 
els approximate white noise, the second model is preferable. The coefficient on the 
pulse term is highly significant, and the AIC and SBC both select the second specifi- 
cation. Our conclusion was that the Libyan bombing did not have the desired effect of 
reducing terrorist attacks against the United States and the United Kingdom. Instead, 
the bombing caused an immediate increase of more than 38 attacks. Subsequently, the 
number of attacks declined; 0.327 of these attacks are estimated to have persisted for 
one period (0.327 - 38.9 = 12.7). Since the autoregressive coefficients imply conver- 
gence, the long-run consequences of the raid were estimated to be zero. 

You can practice estimating an intervention model with the terrorism data shown 
in Figure 5.1. Question 2 at the end of this chapter will guide you through the process. 


2. ADLs AND TRANSFER FUNCTIONS 


A natural extension of the intervention model is to allow the {z,} sequence to be some- 
thing other than a deterministic dummy variable. Consider the following generalization 
of the intervention model: 


Yı = ao +ADy,_-1 + CL)z, + BYE, (5.3) 


where A(L), B(L), and C(L) are polynomials in the lag operator L. 
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In a typical transfer function analysis, the researcher will collect data on the endo- 
geneous variable {y,} and on the exogeneous variable {z,}. The goal is to estimate the 
parameter dy and the parameters of the polynomials A(Z), B(L), and C(L). The major 
difference between (5.3) and the intervention model is that {z,} is not constrained to 
have a particular deterministic time path. In a sense, the intervention variable is allowed 
to be any stationary exogeneous process. The model is called a distributed lag in that it 
distributes the effects of z, on y, across several periods. The polynomial C(L) is called 
the transfer function in that it shows how a movement in the exogeneous variable 
z, affects the time path of (i.e., is transferred to) the endogeneous variable {y,}. The 
coefficients of C(L), denoted by c;, are called transfer function weights. 

It is critical to note that transfer function analysis assumes that {z,} is an exoge- 
neous process that evolves independently of the {y,} sequence. Innovations in {y,} are 
assumed to have no effect on the {z,} sequence so that Ez,€,_, = 0 for all values of s and 
t. Since z, can be observed and is uncorrelated with the current innovation in y, (i.e., the 
disturbance term e€,), the current and lagged values of z, are explanatory variables for 
y,- Let C(L) be co + ciL + CL? +--+. If co = 0, the contemporaneous value of z, does 
not directly affect y,. As such, {z,} is called a leading indicator in that predictions y,, ; 
can be made in period f using z,, z,_,, ... without the need to predict z,,. 

It is easy to conceptualize numerous applications for (5.3). After all, a large part 
of dynamic economic analysis concerns the effects of an “exogeneous” or “indepen- 
dent” sequence {z,} on the time path of an endogeneous sequence {y,}. For example, 
much of the current research in agricultural economics centers on the effects of the 
macroeconomy on the agricultural sector. Using (5.3), farm output {y,} is affected by 
its own past and by the current and past state of the macroeconomy {z,}. The effects 
of macroeconomic fluctuations on farm output can be represented by the coefficients 
of C(L). Here, B(L)e, represents the unexplained portion of farm output. Alternatively, 
the level of ozone in the atmosphere {y,} is a naturally evolving process; hence, in the 
absence of other outside influences, we should expect the ozone level to be well repre- 
sented by an ARIMA model. However, many have argued that the use of fluorocarbons 
has damaged the ozone layer. Because of a cumulative effect, it is argued that current 
and past values of fluorocarbon usage affect the value of y,. Letting z, denote fluoro- 
carbon usage in f, it is possible to model the effects of fluorocarbon usage on the ozone 
layer using a model in the form of (5.3). The natural dissipation of ozone is captured 
through the coefficients of A(L). Stochastic shocks to the ozone layer, possibly due to 
electrical storms and the presence of measurement errors, are captured by B(L)e,. The 
contemporaneous effect of fluorocarbons on the ozone layer is captured by the coeffi- 
cient Cg, and the lagged effects are captured by the other transfer function weights (i.e., 
the values of the various c;). 


ADL Models 


At this point, we are not especially concerned about the coefficients of B(L); let 
B(L)e, = £, so that we write (5.3) as 


Yr = do + AD)y,-1 + CL), + €, (5.4) 
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Since (5.4) contains no moving average terms, it is often called an autoregressive 
distributed lag (ADL) model. In contrast to the pure intervention model, there is no 
pre- versus postintervention period so that we cannot estimate an ADL in the same 
manner we used to we estimate an intervention model. However, the methods are very 
similar in that the goal is to estimate a parsimonious model. The procedure involved 
in fitting an ADL is easiest to explain by considering a simple case of (5.4). To begin, 
suppose {z,} is generated by a white-noise process that is uncorrelated with €, at all 
leads and lags. In addition, suppose that the realization of z, affects the {y,} sequence 
with a lag of unknown duration. Specifically, let 


Yi = Yi F C4Zt-d t Ey (5.5) 


where {z,} and {e€,} are white-noise processes such that E(z,€,_;) = 0, a, and cy are 
unknown coefficients, and d is the “delay” or lag duration to be determined by the 
econometrician. 

Since {z,} and {€,} are assumed to be independent white-noise processes, it is 
possible to separately model the effects of each type of shock. Since we can observe 
the various z, values, the first step is to calculate the cross-correlations between y, and 
the various z,_;. The cross-correlation between y, and z,_; is defined to be 


Pli) = covy z) (0,0;) 


where o, and o, = the standard deviations of y, and z,, respectively. The standard devi- 
ation of each sequence is assumed to be time independent. 

Plotting each value of p,.(i) yields the cross-correlation function (CCF) or 
cross-correlogram. In practice, we must use the cross-correlations calculated using 
sample data because we do not know the true covariances or standard deviations. The 
key point is that the examination of the sample cross-correlations provides the same 


type of information as the ACF in an ARMA model. To explain, solve (5.5) to obtain 
Yi = Cqz-a/( — ay L) + €,/(1 — ayL) 
Use the properties of lag operators to expand the expression cjz,_4/(1 — a,L): 


2 3 
Yi = Cq(Zpg + A1Zt-d-1 + A} Zp-g-2 + AZ -a_3 +++ +) + €,/(1 — aL) 


Analogously to our derivation of the Yule—Walker equations, we can obtain the 
cross-covariances by the successive multiplication of y, by z,, z,_,, ... to form 


Yt = CaSZ pg + A1ZtZt-d-1 + Gedo 
+ 2424-43 +---)+z€,/( -a,L) 
Vp = Cq(Zp—12Zp-a Y AZ p-12-a-1 + a Oa 
+ G32, 1% 4-3 H) +%1€,/ — aL) 


270 CHAPTER5 MULTIEQUATION TIME-SERIES MODELS 


YZi-d = CAi-dZt-d + 1 2-a2-d-1 + ae, dZt-d-2 
+ 2, g2-g-3 + °) + Za, / — aL) 
Vieng = Chg 1Ga + Bie ie 1+az d-12t-d-2 
F Oh hp gs Peo e gt / (lah) 


Now take the expected value of each of the above-mentioned equations. If we con- 
tinue to assume that {z,} and {€,} are independent white-noise disturbances, it follows 
that 

Ey,z, = 0 


EyZ-1 = 


| 
(æ) 


EY 2-4 = C40; 
= 2 
Ey,Z;-g-1 = C4410; 


Ey,%-d-2 = 


| 

io) 
a 

a 


so that in compact form, 


Ey,z,-; = 0 for all i < d 
= cya, 40? fori>d (5.6) 
Dividing each value of Ey,z,_; = COV; Z,_;) by oo, yields the CCF. Note that 
the cross-correlogram consists of zeroes until lag d. The absolute value of height of 
the first nonzero cross-correlation is positively related to the magnitudes of c, and a). 
Thereafter, the cross-correlations decay at the rate a,. The decay of the correlogram 
matches the autoregressive patterns of the {y,} sequence. 


The pattern exhibited by (5.6) is easily generalized. Suppose we allow both z,_, 
and z,_4_, to directly affect y,: 


Yi = A1Yt-1 + Caza + Cd+1Zt-d-1 + Er 
Solving for y,, we obtain 


Yi = (Capa + Ca41%-a-1)/(1 — aL) + €,/(1 — aD) 
2 3 
= Cq(Zg + Ay Z-g-1 + GZ-G-2 + A%-g-3 +° °°) 


+ Capi Gat + 2124-2 + 2-3 + A4Z—g-4 t+) +E,- aL) 
so that 
Yr = CAZt-d + (Cady + Cay1)Zt-d-1 + 41 (Cay + Cay )Z1-d-2 
+ a (cga + Cap Z-a-3 HF E/O — aL) 
Given that we are assuming Ez, = 0, the cross-covariances are Ey,z,_; and the 
cross-correlations are Ey,z,_;/o,o,. In the literature, it is also common to work with 


the standardized cross-covariances denoted by Ey,z,_;/o2. The choice between the 
two is a matter of indifference since the CCF and the standardized cross-covariance 
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function (CCVF) are proportional to each other. In the example at hand, the CCVF 
reveals the following pattern:! 


Vy =0 fori< d 
= čj for i= d 
= cgay + Cq41 fri=d+1 


= a ™! (caa; +c) fori>d+1 


Panel (a) of Figure 5.4 shows the shape of the cross-covariances for d = 3, c, = 1, 
Caz = 1.5, and a, = 0.8. Note that there are distinct spikes at lags 3 and 4 correspond- 
ing to the nonzero values of c} and c4. Thereafter, the CCVF decays at the rate a,. Panel 
(b) of the figure replaces c4 with the value —1.5. Again, all cross-covariances are zero 
until lag 3; since c3 = 1, the standardized value of y,,(3) = 1. To find the standard- 
ized value of y,.(4), form y,,(4) = 0.8 — 1.5 = —0.7. The subsequent values of y,,(i) 
decay at the rate 0.8. The pattern illustrated by these two examples generalizes to any 
intervention model of the form 


Y; = Ay + ayy, + C(L)z +E, (5.7) 


yt= 0.8yt-1 + Zt-3 + 1.5244 + €t yt = 0.8yz_4 + Z¢-3 — 1.5244 + €t 


5 10 
0 15 20 
-1 + 
0 5 10 15 20 
Panel (a) Panel (b) 
yt = 0.8yt-1 — 0.6 yz-2 + Zt-3 + €t Yt= 1.4yt-1 — 0.6Yt-2 + Zt-3 + €t 
1.5 + 1.5 F 


-0.5 wk 
Panel (c) Panel (d) 


FIGURE 5.4 Four Cross-Covariance Functions 
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The theoretical CCVF (and CCF) has a shape with the following characteristics: 


1. All Vy) will be zero until the first nonzero element of the polynomial C(L). 


2. A spike in the CCVF indicates a nonzero element of C(L). Thus, a spike at lag 
d indicates that z,_, directly affects y,. 

3. All spikes decay at the rate a, convergence implies that the absolute value 
of a, is less than unity. If 0 < a, < 1, decay in the cross-covariances will be 
direct, whereas if —1 < a, < 0, the decay pattern will be oscillatory. 


Only the nature of the decay process changes if we generalize equation (5.7) to 
include additional lags of y,_;. In the general case of (5.4), the decay pattern in the 
cross-covariances is determined by the characteristic roots of the polynomial A(ZL); 
the shape is precisely that suggested by the autocorrelations of a pure ARMA model. 
This should not come as a surprise; in the examples of (5.5) and (5.7), the decay factor 
was simply the first-order coefficient a,. We know that there will be decay since all 
characteristic roots of 1 — A(Z) must be outside the unit circle for the process to be 
stationary. Convergence will be consistent with the patterns laid out in Table 2.1. 


The Cross-Covariances of a Second-Order Process 
To use another example, consider the ADL: 


Yi = UY] F AQVj-2 F Cgk pg F Er 


Using lag operators to solve for y, is inconvenient since we do not know the numer- 
ical values of a, and a). Instead, use the method of undetermined coefficients and form 


the challenge solution: 
o0 foe) 
y= a Wik) j + > ViE ri 
i=0 i=0 


where the w; and v; are the undetermined coefficients. 
You should be able to verify that the values of the w; are given by 


Wo =0 


Wa = Ca 
Wa41 = Ca 
Wa. = Cala; +d) 
Wa43 = 4 Way2 + A2Wg41 


Wa44 = 4 Wa43 + A2Wq42 


Thus, for all i > d+ 1, the successive coefficients satisfy the difference equation 
Wi = 4|W;_| + 4W,_2. At this stage, we are not interested in the values of the various 
v;, SO it is sufficient to write the solution for y, as 


2 
Yi = C4Zt-d + CgQyZ-g-1 + Cala] + y)Z-g-o + °° + EVE i 
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Next, use this solution for y, to form all covariances using the Yule—Walker 
equations. Forming the expressions for y,,(i) as Ey,z,_;/ o? 


Yy(Ù = fori<d [since Ez,z,_; = 0 fori < d] 
Yy (d) = C4 

Vy(d -l=a,cq 

Yye(d — 2) = c(a? + ay) 


Thus, there is an initial spike at lag d reflecting the nonzero value of c,. After 
one period, a, percentage of c, remains. After two periods, the decay pattern in the 
standardized cross-covariances begins to satisfy the difference equation: 


Ky = a1Yy;(i —1)+ Ax7y (i — 2) 


Panel (c) of Figure 5.4 shows the shape of the CCVF for the case of d = 3, c, = 1, 
a, = 0.8, and a, = —0.6. The oscillatory pattern reflects the fact that the characteristic 
roots of the process are imaginary. For purposes of comparison, Panel (d) shows the 
CCVF of another second-order process with imaginary roots. 


Higher-Order Input Processes 


The econometrician will rarely be so fortunate as to work with a {z,} series that is 
white noise. We need to further generalize our discussion of ADLs to consider the case 
in which the {z,} sequence is a stationary AR process. As discussed in the following, 
the estimation of the transfer function becomes more difficult in this case. However, the 
extra difficulty is worthwhile because a rich set of interactions between the variables is 
possible. For a moment, we can abstract from the estimation problem and consider the 
system of equations represented by (5.4)—reproduced for your convenience —and the 
{z,} process: 


Va = dog +A(L)y,_1 + C(L)z, + E 
2, = D(L)z 1 + Ez (5.8) 


where D(L) is a polynomial in the lag operator L and £, is white noise. The roots of 
D(L) are such that the {z,} sequence is stationary. Since {z,} is independent of {y,}, 
shocks to the {y,} sequence cannot influence {z,}. As such, it must be the case that 
Fe,€,, = 0. 

Once the coefficients of the two equations have been properly estimated, it is pos- 
sible to trace out three impulse response functions. As in Chapter 2, it is possible to 
use (5.8) to trace out the impulses responses of an €,, shock on the {z,} series or those 
of an g, on the {y,} sequence. More importantly, it is possible to trace out the effects 
of an £, shock on the entire {y,} series. A one-unit shock to €,, directly affects z, by 
one unit and y, by CoE; units. It is relatively straightforward for a computer to trace out 
the effects of the €,, shock on the entire {z,} and {y,} sequences. Formally, the impulse 
responses of £, shocks on the {y,} sequence are given by combining (5.4) and (5.8) 
such that 

Yy, = ao + A(L)y,_) + CC) — DULL Ie, + €; 
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If you solve for y,, it will be clear that the impulse responses are the coefficients 
of C(L)[1 — DIL)L]/[1 — A(L)L]. In addition, ADLs are useful because they are 
conducive to multistep-ahead forecasting. Since {z,} is an independent process, you 
can use (5.8) to forecast subsequent values of z, using the techniques developed in 
Chapter 2. As such, if you have T observations, you can use (5.8) to form the forecasts 
E7241, E7242, .-.. These forecasts are used in the multistep-ahead forecasts for 
Yr4;- For example, suppose that z, = d,z,_, + €,, and y, = a, y,_; + C12; + E, Since the 
j-step-ahead forecasts for zy4; are (d,)/zp, the multistep-ahead forecasts for yp +j are 


Eryrai = yr + C EpZpy = 41yr + Cd)27 
Eryr42 = (ay) yp + cdi (a, + dy zr 


Identification and Estimation 


Since {z,} evolves independently of {y,}, we can use the methodology developed in 
Chapter 2 to estimate {z,} as the AR process given by (5.8). The residuals from such a 
model, denoted by {€,,}, should be white noise. The idea is to estimate the innovations 
in the {z,} sequence even though the sequence itself is not a white-noise process. 
Once (5.8) has been estimated, you can choose between two techniques to estimate 
the ADL. If you are unconcerned about parsimony, you can simply use (5.8) such that 


p n 
Yı = ay + >, iyi + 5 Citi t Er (5.9) 
i=l i=0 


Unlike the standard Box—Jenkins approach, begin estimating the ADL using the 
largest values of p and n deemed feasible. Then, F-tests and t-tests can be used to pare 
down the lag lengths of the model. In addition, you could also use the AIC or the SBC to 
find the lag lengths yielding the best fit. As in any time-series estimation, it is crucial to 
perform the appropriate diagnostic checks to ensure that the residuals are white noise. 
The benefit of this method is that it is simple to perform. However, you can easily 
end up with an overly parameterized model. Since z, and z,_; are correlated (and are 
correlated with the values of y,_;), it is not straightforward to use t-tests to pare down the 
coefficients of C(L). Typically, once the lag lengths p and n are determined, there are no 
further attempts to pare down the model. Nevertheless, the method is quite common and 
is consistent with the vector-autoregressive methodology discussed in Sections 5-13. 

The second method tries to pare down the model is a fashion consistent with the 
Box—Jenkins methodology. As in the case where {z,} is white noise, the idea is to 
use the cross-correlations to obtain the pattern of the coefficients as they appear in 
the ADL. It is tempting to think that we should form the cross-correlations between 
the {y,} sequence and {€,,_;}. However, this procedure would be inconsistent with the 
maintained hypothesis that the structure of the transfer function is given by (5.4). The 
reason is that z,,Z,_1, Z;_2, --- (and not simply the innovations) directly affect the value 
of y,. Cross-correlations between y, and the various €,,_; would not reveal the pattern 
of the coefficients in C(L). 
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The appropriate methodology is to filter the {y,} sequence by multiplying (5.4) 
by the previously estimated polynomial D(L). As such, the filtered value of y, is D(L)y, 
and is denoted by yp. The cross-correlations of €., and yz reveal the form of the ADL. 
To explain, multiply (5.4) by D(L) to obtain 


DL)y, = DL)ag + DDAL,- + CDL), + DL, 
Given that D(L)y, = yg, DD)y,-1 = Ye_1, and D(L)z, = €,,, this is equivalent to 
Vp = D(L)dy + A(L)yg_) + CDéy + DLE, (5.10) 


Although you could construct the sequence D(L)y,, most software packages can 
make the appropriate transformations automatically. The important point is that the 
cross-covariances of Yh and £, reveal the coefficients of C(L). We can examine the 
CCVF between y, and £, to determine the spikes and the decay pattern as aids in 
determining the form of C(L). 

To illustrate why filtering is important, consider the example where z, = d4Z,—1 + 
Ex and y, = a y,_| + C1Z + €;. Given that you can never actually observe the form of 
the transfer function, you might not be able to deduce that only z, has a direct effect 
on y,. In fact, substitution for z, yields y, = a,y,_1 + ¢)(d)Z,_, + Ex) + £, AS such, you 
might be fooled into estimating an equation of the form 


Yi = AYp-y Fgh jy +E 


Although there is nothing “wrong” with this equation, the interpretation is such that 
z; affects the {y,} sequence with a one-period lag. It should also be clear that var(€,,) = 
var(c,é,, + €,). Hence, the estimated transfer function will have a larger variance than 
that from y, = a)y,_; + c,z, + €;. The proper way to identify the form of the transfer 
function is to filter the values of y, such that 


Ya =U — d DY, = Y, — dY 


Since the goal is to form the filtered series as D(L)y,, for the example in hand, 
multiply each side of the transfer function by (1 — d,L) to obtain 


d -d,Dy, = a,(1—d,Dy,_; +, -d Dz, +1 -d,De, 


or 
Yh = UV p—y + CyEy HE diE1 


Hence, you can simply subtract d; y,—; from y, to obtain yp. Clearly, the covariances 
between yp and £, will have the same pattern as those between y, and z,. In summary, 
the full procedure for fitting an ADL entails: 


STEP 1: Estimate the z, sequence. The technique used at this stage is precisely 
that for estimating any AR model. A properly estimated AR model should 
approximate the data-generating process for the {z,} sequence. The calcu- 
lated residuals {€,,} are called the filtered values of the {z,} series. These 
filtered values can be interpreted as the pure innovations in the {z,} sequence. 
Calculate and store the {€,,} sequence. 
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STEP 2: 


STEP 3: 


STEP 4: 


Identify plausible candidates for the C(L) function. Constrict the filtered 
{y,} sequence by applying the filter D(L) to each value of {y,}; that is, use 
the results of Step 1 to obtain D(L)y, = yp. The cross-correlograms between 
Yp and Ê; can help identify the form of C(L). Remember that spikes in the 
cross-correlogram indicate nonzero values of c;. In practice, examination 
of the cross-correlogram will suggest several plausible transfer functions. 
Of course, the sample cross-covariances will not precisely conform to their 
theoretical values. Under the null hypothesis that the cross-correlations are 
all zero, the sample variance of cross-correlation coefficient i asymptotically 
converges to (T — i)~' where T = number of usable observations. Let Ky) 
denote the sample cross-correlation coefficient between y, and z,_;. Under 
the null hypothesis that the true values of p,,(i) all equal zero, the variance of 
ry(i) converges to l 

var[r,.()] = (T — i)! 


For example, with 100 usable observations, the standard deviation of the 
cross-correlation coefficient between y, and z,_, is approximately equal to 
0.10. If the calculated value of r,.(1) exceeds 0.2 (or is less than —0.2), the 
null hypothesis can be rejected. Significant cross-correlations at lag i indicate 
that an innovation in z, affects the value of y,,;. To test the significance of the 
first k cross-correlations, use the statistic 


k 
Q=TT +2}, rT- k) 


i=0 


Asymptotically, Q has a y? distribution with (k — p, — p>) degrees of 
freedom where p, and p, denote the number of nonzero coefficients in A(L) 
and C(L), respectively. 

Identify plausible candidates for the A(L) function. Regress y, (not y) on 
the selected values of {z,} to obtain a model of the form 


y, = C(L)z;, + e, 


where e, denotes the error term, which is not necessarily white noise. 

The ACF of the {e,} sequence is suggestive of the form of A(L). If the 
{e,} sequence is white noise, your task is complete. However, the correlo- 
gram with generally reveal several suggestive forms for A(L). [Note: At this 
point, you might want to model the more general model of (5.3). If the ACF 
and PACF of the e, series suggests that it might be an ARMA process, form 
tentative models for both A(L) and B(L).] 
Combine the results of Steps 2 and 3 to estimate the full equation. At this 
stage, you will estimate A(L), and C(L) simultaneously. The properties of a 
well-estimated model are such that the coefficients are of high quality, the 
model is parsimonious, the residuals conform to a white-noise process, and 
the forecast errors are small. You should compare your estimated model to 
the other plausible candidates from Steps 2 and 3. 
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There is no doubt that estimating a parsimonious ADL involves judgment on the 
part of the researcher. Experienced econometricians would agree that the procedure 
is a blend of skill, art, and perseverance that is developed through practice. Keep in 
mind that the goal is to find a parsimonious representation of a potentially complicated 
interaction among the variables. As in an ARMA process, different models can have 
similar economic implications and yield similar forecasts. Nevertheless, there are some 
hints that can be quite helpful. 


1. After estimating the full model in Step 4, if the residuals in (5.9) are corre- 
lated with {z,}, the C(L) function is probably misspecified. Return to Step 3 
and reformulate the specifications of A(L) and C(L). 

2. The sample cross-correlations are not meaningful if {y,} and/or {z,} are not 
stationary. You can test each for a unit root using the procedures discussed in 
Chapter 4. In the presence of unit roots, Box and Jenkins (1976) recommend 
differencing each variable until it is stationary. Chapter 6 considers unit roots 
in a multivariate context. For now, it is sufficient to note that this recommen- 
dation can lead to overdifferencing. 


The interpretation of the ADL depends on the type of differencing performed. 
Consider the following three specifications and assume that |a,| < 1: 


Yi = A Yz-1 + Col + EY (5.11) 
Ay, = a, Ay,_1 + Coz; + E; (5.12) 
Yi = y,_1 + CoAzZ, +E, (5.13) 


In (5.11), a one-unit shock in z, has the initial effect of increasing y, by cp units. 
This initial effect decays at the rate a,. In (5.12), a one-unit shock in z, has the initial 
effect of increasing the change in y, by co units. The effect on the change decays at the 
rate a,, but the effect on the level of the {y,} sequence never decays. In (5.13), only the 
change in z, affects y,. Here, a pulse in the {z,} sequence will have a temporary effect 
on the level of {y,}. Questions 3 and 4 at the end of this chapter are intended to help 
you gain familiarity with the different specifications. 

Be aware that it is possible to obtain a more parsimonious model by allowing for 
MA terms. Just as an ARMA model can be more parsimonious than a pure AR model, 
it might be possible that (5.3) provides a more parsimonious fit than (5.4). Moreover, 
it is also possible to allow for MA terms in (5.8). 


3. AN ADL OF TERRORISM IN ITALY 


The clustering of high-profile terrorist events (e.g., the hijacking of TWA flight 847 on 
June 14, 1985; the hijacking of the Achille Lauro cruise ship on October 7, 1985; and 
the Abu Nidal attacks on the Vienna and Rome airports on December 27, 1985) caused 
much speculation in the press about tourists changing their travel plans. Similarly, 
the tourism industry was especially hard-hit after the attacks on September 11, 2001. 
Although opinion polls of prospective tourists suggest that terrorism affects tourism, 
the true impact, if any, can best be discovered through the application of statistical 
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techniques. Polls conducted in the aftermath of significant incidents cannot indicate 
whether respondents actually rebooked trips. Moreover, polls cannot account for 
tourists not surveyed who may have been induced to take advantage of offers designed 
to entice tourists back to a troubled spot. 

To measure the impact of terrorism on tourism, in Enders, Sandler, and Parise 
(1992), we constructed the quarterly values of total receipts from tourism for 12 
countries.” The logarithmic share of each nation’s revenues was treated as the depen- 
dent variable {y,}, and the number of transnational terrorist incidents occurring within 
each nation was treated as the independent variable {z,}. The crucial assumption for 
the use of intervention analysis is that there is no feedback from tourism to terrorism. 
This assumption would be violated if changes in tourism-induced terrorists to change 
their activities. 

Consider an ADL in the form of (5.4): 


Y; = ay + A(L)y,_1 + C(L)z, + BLE; 


where y, = deseasonalized (with seasonal dummy variables) values of the logarithmic 
share of a nation’s tourism revenues in quarter f and z, is the number of transnational 
terrorist incidents within that country during quarter t.? 

If we use the methodology developed in the previous section, the first step in fit- 
ting an ADL is to fit an AR model to the {z,} sequence. For illustrative purposes, it is 
helpful to consider the Italian case since terrorism in Italy appeared to be white noise 
(with a constant mean of 4.20 incidents per quarter). Let p,(i) denote the autocorrela- 
tions between z, and z,_;. If you are following along with the data on the file labeled 
ITALY.XLS, be sure to set the sample for 1971Q1 — 198804. The correlogram for 
terrorist attacks in Italy is 


Correlogram for Terrorist Attacks in Italy 


p(0) D D O D O O N 8) 
1 0.14 0.05 —0.06 —0.04 0.13 —0.00 0.01 —0.12 


Each value of p,(i) is less than two standard deviations from unity, and the 
Ljung—Box Q-statistics indicate that no groupings are significant. Since terrorist 
incidents appear to be a white-noise process, we have completed Step 1; there is no 
need to fit an AR model to the series or to filter the {y,} sequence for Italy. At this 
point, we conclude that terrorists randomize their acts so that the number of incidents 
in quarter ¢ is uncorrelated with the number of incidents in previous periods. 

Step 2 calls for obtaining the cross-correlogram between tourism and terrorism. 
The cross-correlogram is 


Cross-Correlogram Between Tourism and Terrorism in Italy 


PO D PD O D Pyel5) Pye) PCT) P D Pye(10) py.) 
—0.18 —0.23 —0.24 -0.05 0.04 0.13 0.04 0.00 0.11 0.12 0.26 0.19 
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There are several interesting features of the cross-correlogram: 


1. 


With T observations and i lags, the theoretical value of the standard devi- 
ation of each value of p,.(i) is (T — i)~!/?. With 73 observations, T~!/? is 
approximately equal to 0.117. At the 5% significance level (i.e., two stan- 
dard deviations), the sample value of p,.(0) is not significantly different from 
zero, and p,.(1) and p,,(2) are just on the margin. However, the Q-statistic 
for Py-(0) = Pull) = Py,(2) = 0 is significant at the 0.01 level. Thus, there 
appears to be a strong negative relationship between terrorism and tourism 
beginning at lag 1 or 2. 

It is good practice to examine the cross-correlations between y, and leading 
values of z,,;. If the current value of y, tends to be correlated with future val- 
ues of z,,;, it might be that the assumption of no feedback is violated. The 
presence of a significant cross-correlation between y, and leads of z, might 
be due to the effect of the current realization of y, on future values of the {z,} 
sequence. 


The large values of p,,(10) and p,.(11) are suggestive of a possible long-term 
effect of terrorism on tourism. Although there are a relatively small number 
of total observations, it is wise to entertain the possibility of several plausible 
models at this point in the process. 


Step 3 entails examining the cross-correlogram and estimating each of the plausi- 
ble models. Based on the ambiguous evidence of the cross-correlogram, several differ- 
ent models for the transfer function were estimated. We estimated models of the form 
y; = c + C(L)z, + e, experimenting with delay factors of 0, 1, 2, and 3 quarters. Some 
of our estimates are reported in Table 5.2. 

Model 1 has the form y, = € + coz, + C1Zi-1 + C2242 + €3Z,_3 + €,. The problem 
with this specification is that the c3 is not significantly different from zero. Eliminating 
this coefficient yields Model 2. Notice that most of the coefficients of Model 2 are 


Table 5.2 Terrorism and Tourism in Italy (Estimates from Step 3) 


c Cy ci C C3 AIC/SBC 

Model 1 0.04 —0.0028 —0.0038 —0.0042 —0.001 —5.20/ 
(1.86) (—1.15) (—1.57) (—1.76) (-0.24) 5.97 

Model 2 0.04 —0.0028 —0.0039 —0.0044 -7.14/ 
(1.94) (—1.15) (—1.59) (—1.82) 1.80 

Model 3 0.03 —0.0042 —0.0044 -7.76/ 
(1.60) (—1.74) (—1.84) —1.05 

Model 4 0.01 —0.0050 —6.65/ 
(0.87) (—2.05) —2.17 

Model 5 0.01 —0.0048 —6.30/ 
(0.82) (—1.96) —1.82 


Note: The numbers in parentheses are the t-statistics for the null hypothesis of a zero coefficient. To 
ensure comparability, all models were estimated over the period 197104-198804. 
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not significant at conventional levels. Eliminating the variable zy yields Model 3 in 
which the coefficients c; and c, are negative yet marginally significant. The F-test for 
the null hypothesis c} = c) = 0 is 3.69 with a significance level 0.03. As such, there 
does seem to be a negative effect of terrorism on tourism. Moreover, we need to be 
cautious about such f-tests since the regression residuals {e,} are serially correlated. 
Respectively, Models 4 and 5 seek to determine whether it is preferable to eliminate 
Z2 Or Zz}. Overall, the AIC selects Model 3, whereas the SBC selects Model 4 (with a 
single delay factor of 2). 

Since the results at Step 3 are mixed, and cross-correlogram seems to have two 
spikes and exhibits little decay, allow both z,_, and z,_, to directly affect y,. For Step 4, 
estimate Model 3 over the full sample period, eliminate the intercept and obtain the {e, } 
sequence as e, = y; — 0.00237z,_, — 0.0026z,_,. The correlogram of the residuals is 


PO pl) AD e) A © O D A A AA PAD (12) 
10 0.67 0.60 0.47 0.47 0.23 0.14 0.08 -0.08 -0.17 —0.18 —0.24 -0.23 


If you experiment a bit, you should find that reasonable models for the {e,} series 
are an AR(2), an ARMA(2,II4II), and an AR(1) with a seasonal AR(1) term. The most 
promising of the three seems to be 


(1 — 0.692L)(1 — 0.379L')e, 
As such, the tentative transfer function is 
(1 — 0.692L)(1 — 0.379L*)y, = —0.0042z,_,; — 0.0044z,_, + £; (5.14) 


The problem with (5.14) is that the coefficients in the first expression were esti- 
mated separately from the coefficients in the second expression. In Step 4, if you 
estimate all coefficients simultaneously you should obtain 


(1 — 0.694L) (1 — 0.394L*) y, = —0.0030z,_; — 0.0040z,_, +£, AIC = —63.52 
(7.01) (3.41) (—2.15) (—2.91) SBC = —54.82 
(5.15) 

Note that the coefficients of (5.15) are similar to those of (5.14). The Ljung—Box 
Q-statistics indicate that the residuals of (5.15) appear to be white noise. For example, 
Q(4) = 5.34, Q(8) = 9.11, and Q(12) = 20.26 with significance levels of 0.25, 33, and 
0.06, respectively. Nevertheless, as p(11) = —0.27, there might be some information in 
the residuals at the very long lags. 

Our ultimate aim was to use the estimated transfer function to simulate the effects 
of a typical terrorist incident. Initializing the system such that all values of yọ = yı = 
y2 = yz = 0 and setting all {£,} = 0, we let the value of z, = 1. Figure 5.5 shows the 
impulse response function for this one unit change in the {z,} sequence. As you can 
see from the figure, after a one-period delay, tourism in Italy declines sharply. After 
a sustained decline, tourism returns to its initial value in approximately 3 years. As a 
result of the multiplicative seasonal term, there is an oscillating decay pattern. 

Integrating over time and over all incidents allowed us to estimate Italy’s total 
losses to tourism. The undiscounted losses exceeded 600 million SDR; using a 5% real 
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FIGURE 5.5 Italy's Share of Tourism 


interest rate, the total value of the losses exceeded 861 million 1988 SDRs (equal to 
6% of Italy’s annual tourism revenues). Question 5 at the end of this chapter asks you 
to compare the model estimated here to that of (5.9) estimated by simply using the 
general-to-specific method. 

In the actual paper, we used a slightly method than the one described here. Specif- 
ically, we allowed the transfer function to have the form C(L) = E(L)/F(L) so that the 
estimated model became y, = dg + A(L)y,_; + E(L)z,/F(L) + B(L)e,. Instead of using 
very long lags for the C(L) function, the effect of allowing for a polynomial lag in the 
denominator is to further spread (or transfer) the effects of z, shocks over a number of 
periods. For example, if |f| < 1,z,,/(1-f,D) is z1 +f,Z_2 +7 hs +--+. In this 
way, instead of estimating a large number of coefficients for the C(L) function, a sin- 
gle denominator lag imparts a geometrically decaying effect of the shock. We deemed 
this important because the data contain a small number of observations and the cor- 
relations at long lags are large. Most software packages allow for both numerator and 
denominator lags in the transfer function. To estimate such a model, in Step 2, you can 
experiment with several different forms of low-order E(L) and F(L) functions. 


4. LIMITS TO STRUCTURAL MULTIVARIATE 
ESTIMATION 


There are two important difficulties involved in fitting a multivariate equation such 
as a transfer function. The first concerns the goal of fitting a parsimonious model. 
Obviously, a parsimonious model is preferable to an overparameterized model. In the 
relatively small samples usually encountered in economic data, estimating an unre- 
stricted model may so severely limit degrees of freedom as to render forecasts use- 
less. Moreover, the possible inclusion of large but insignificant coefficients will add 
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variability to the model’s forecasts. However, in paring down the form of the model, 
two equally skilled researchers will likely arrive at two different transfer functions. 
Other researchers examining the Italian tourism data may have been concerned about 
the correlations at lags 8 and 9 or picked different delay parameters. Although one 
model may have a better “fit” (in terms of the AIC or SBC), the residuals of the other 
may have better diagnostic properties. There is substantial truth to the consensus opin- 
ion that fitting a transfer function model has many characteristics of an “art form.” 
There is a potential cost to using a parsimonious model. Suppose you simply estimate 
the equation y, = A(L)y,_,; + C(L)z, + B(L)e, using long lags for A(L), B(L), and C(L). 
As long as {z,} is exogeneous, the estimated coefficients and forecasts are unbiased 
even though the model is overparameterized. Such is not the case if the researcher 
improperly imposes zero restrictions on any of the polynomials in the model. 

The second problem concerns the assumption of no feedback from the {y,} 
sequence to the {z,} sequence. For the coefficients of C(L) to be unbiased estimates 
of the impact effects of {z,} on the {y,} sequence, z, must be uncorrelated with 
{€,} at all leads and lags. Although certain economic models may assert that policy 
variables (such as money supply or government spending) are exogeneous, there may 
be feedback such that the policy variables are set with specific reference to the state 
of other variables in the system. To understand the problem of feedback, suppose 
that you were trying to keep a constant 70°F temperature inside your apartment by 
turning the thermostat up or down. Of course, the “true” model is that turning up 
the heat (the intervention variable z,) warms up your apartment (the {y,} sequence). 
However, intervention analysis cannot adequately capture the true relationship in the 
presence of feedback. Clearly, if you perfectly controlled the inside temperature, there 
would be no correlation between the constant value of the inside temperature and the 
movement of the thermostat. Alternatively, you might listen to the weather forecast 
and turn up the thermostat whenever you expected it to be cold. If you underreacted 
by not turning the heat high enough, the cross-correlogram between the two variables 
would tend to show a negative spike reflecting the drop in room temperature with the 
upward movement in the thermostat setting. Instead, if you overreacted by greatly 
increasing the thermostat setting, both room temperature and the thermostat setting 
would rise together. Only if you moved the thermostat setting without reference to 
room temperature would we expect to uncover the actual model. 

The need to restrict the form of the transfer function and the problem of feedback 
or “reverse causality” led Sims (1980) to propose a nonstructural estimation strategy. 
To best understand this Noble Prize winning approach, it is useful to consider the state 
of macroeconometric modeling that led Sims to his then radical ideas. 


Multivariate Macroeconometric Models: Some 
Historical Background 


Traditionally, macroeconometric hypothesis tests and forecasts were conducted using 
large-scale macroeconometric models. Usually, a complete set of structural equations 
was estimated, one equation at a time. Then, all equations were aggregated in order 
to form overall macroeconomic forecasts. Consider two of the equations from the 
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Brookings Quarterly Econometric Model of the United States, as reported by Suits 
and Sparks (p. 208, 1965): 


Cyr = 0.0656Yp — 10.93(Pong¢/Po),-1 + 0.1889N + Ny); 
(0.0165) (2.49) (0.0522) 


Cypr = 4.2712 + 0.1691¥p — 0.0743(ALQDpy/Po) -1 
(0.0127) (0.0213) 


where Cyp = personal consumption expenditures on food 

Yp = disposable personal income 

Ponp = implicit price deflator for personal consumption expenditures on food 
Po = implicit price deflator for personal consumption expenditures 

N = civilian population 
Nm = military population including armed forces overseas 
Cyrr = personal consumption expenditures for nondurables other than food 
ALQD yy = end-of-quarter stock of liquid assets held by households 


and standard errors are in parentheses. 

The remaining portions of the model contain estimates for the other components of 
aggregate consumption, investment spending, government spending, exports, imports, 
the financial sector, various price determination equations, and so on. Note that food 
expenditures, but not expenditures on other nondurables, are assumed to depend on 
relative price and population. However, expenditures for other nondurables are assumed 
to depend on real liquid assets held by households in the previous quarter. 

Are such ad hoc behavioral assumptions consistent with economic theory? Sims 
(1980, p. 3) considers such multiequation models and argues that 


“cc 


. what “economic theory’ tells us about them is mainly that any vari- 
able that appears on the right-hand side of one of these equations belongs 
in principle on the right-hand side of all of them. To the extent that models 
end up with very different sets of variables on the right-hand side of these 
equations, they do so not by invoking economic theory, but (in the case 
of demand equations) by invoking an intuitive econometrician’s version 
of psychological and sociological theory, since constraining utility func- 
tions is what is involved here. Furthermore, unless these sets of equations 
are considered as a system in the process of specification, the behavioral 
implications of the restrictions on all equations taken together may be less 
reasonable than the restrictions on any one equation taken by itself.” 


On the other hand, many of the monetarists used reduced-form equations to ascer- 
tain the effects of government policy on the macroeconomy. As an example, con- 
sider the following form of the St. Louis model estimated by Anderson and Jordan 
(1968). Using U.S. quarterly data from 1952 to 1968, they estimated the following 
reduced-form GNP determination equation: 


AY, = 2.28 + 1.54AM, + 1.56AM,_, + 1.44AM,_, + 1.29AM,_; 
+ 0.40AE, + 0.54AE,_, — 0.03AE,_, — 0.74AE,_; (5.16) 
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where AY, = change in nominal GNP 
AM, = change in the monetary base 
AE, = change in “high employment” budget deficit. 


In their analysis, Anderson and Jordan used base money and the high employ- 
ment budget deficit because these are the variables under the control of the monetary 
and fiscal authorities, respectively. The St. Louis model was an attempt to demon- 
strate the monetarist policy recommendations that changes in the money supply, but 
not changes in government spending or taxation, affected GNP. The t-tests for the indi- 
vidual coefficients are misleading because of the substantial multicollinearity between 
each variable and its lags. However, testing whether the sum of the monetary base 
coefficients (i.e., 1.54 + 1.56 + 1.44 + 1.29 = 5.83) differs from zero yields a t-value 
of 7.25. Hence, they concluded that changes in the money base translate into changes 
in nominal GNP. Since all the coefficients are positive, the effects of monetary pol- 
icy are cumulative. On the other hand, the test that the sum of the fiscal coefficients 
(0.40 + 0.54 — 0.03 — 0.74 = 0.17) equals zero yields a t-value of 0.54. According to 
Anderson and Jordan, the results support “lagged crowding out” in the sense that an 
increase in the budget deficit initially stimulates the economy. Over time, however, 
changes in interest rates and other macroeconomic variables lead to reductions in pri- 
vate sector expenditures. The cumulated effects of the fiscal stimulus are not statisti- 
cally different from zero. 

Sims (1980) also points out several problems with this type of analysis. Sims’s 
criticisms are easily understood by recognizing that (5.16) is a transfer function with 
two independent variables {M,} and {£,} and no lags of the dependent variable. As 
with any type of transfer function analysis, we must be concerned with two things: 


1. Ensuring that lag lengths are appropriate. Serially correlated residuals in the 
presence of lagged dependent variables lead to biased coefficient estimates. 

2. Ensuring that there is no feedback between GNP and the money base or the 
budget deficit. However, the assumption of no feedback is unreasonable if the 
monetary or fiscal authorities deliberately attempt to alter nominal GNP. As 
in the thermostat example, if the monetary authority attempts to control the 
economy by changing the money base, we cannot identify the “true” model. 
In the jargon of time-series econometrics, changes in GNP would “cause” 
changes in the money supply. One appropriate strategy would be to simul- 
taneously estimate the GNP determination equation and the money supply 
feedback rule. 


Comparing the two types of models, Sims (1980, pp. 14—15) states: 


“Because existing large models contain too many incredible restrictions, 
empirical research aimed at testing competing macroeconomic theories 
too often proceeds in a single- or few-equation framework. For this reason 
alone, it appears worthwhile to investigate the possibility of building large 
models in a style which does not tend to accumulate restrictions so hap- 
hazardly. ... It should be feasible to estimate large-scale macromodels as 
unrestricted reduced forms, treating all variables as endogenous.” 
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5. INTRODUCTION TO VAR ANALYSIS 


When we are not confident that a variable is actually exogeneous, a natural extension 
of transfer function analysis is to treat each variable symmetrically. In the two-variable 
case, we can let the time path of {y,} be affected by current and past realizations of the 
{z,} Sequence and let the time path of the {z,} sequence be affected by current and past 
realizations of the {y,} sequence. Consider the simple bivariate system: 


Y, = bio — by 2% + MY H-1 + V121 + Eyr (5.17) 
Zi = bao — bay, + YoiYi-1 + Y20%1-1 + Ez (5.18) 


where it is assumed that (i) both y, and z, are stationary; (ii) €,, and €,, are white-noise 
disturbances with standard deviations of o, and o,, respectively; and (iii) {€,,} and 
{e€,,} are uncorrelated white-noise disturbances. 

Equations (5.17) and (5.18) constitute a first-order vector autoregression (VAR) 
because the longest lag length is unity. This simple two-variable first-order VAR is use- 
ful for illustrating the multivariate higher order systems that are introduced in Section 8. 
The structure of the system incorporates feedback because y, and z, are allowed to affect 
each other. For example, —b,, is the contemporaneous effect of a unit change of z, on 
y, and 7, is the effect of a unit change in z,_; on y,. Note that the terms £, and €., are 
pure innovations (or shocks) in y, and z,, respectively. Of course, if b; is not equal to 
zero, €,, has an indirect contemporaneous effect on z,, and if b4; is not equal to zero, €., 
has an indirect contemporaneous effect on y,. Such a system could be used to capture 
the feedback effects in our temperature-thermostat example. The first equation allows 
current and past values of the thermostat setting to affect the time path of the tempera- 
ture; the second allows for feedback between current and past values of the temperature 
and the thermostat setting. 

Equations (5.17) and (5.18) cannot be estimated by OLS since y, has a contem- 
poraneous effect on z, and z, has a contemporaneous effect on y,. The OLS estimates 
would suffer from simultaneous equation bias since the regressors and the error terms 
would be correlated. Fortunately, it is possible to transform the system of equations 
into a more usable form. Using matrix algebra, we can write the system in the compact 


form: 
| 1 o [z = ka + iss d ha + [e] 
by, 1 Zt boo Y21 Yaj [2-1 Ez 
or 
Bx, =o +x) +E, 
where 


1 b b 
B= ahn a 
n 1 Zt 9 ba 
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Premultiplication by B7! allows us to obtain the VAR model in standard form: 

x, = Ag + AX] ae; (5.19) 
where Ay = BI, A, = BIT], and e, = B“'e,. 

For notational purposes, we can define a;g as element i of the vector Ao, a,; as the 
element in row 7 and column j of the matrix A,, and e; as the element i of the vector e,. 
Using this new notation, we can rewrite (5.19) in the equivalent form: 

Yi = 10 + 4 Vr-1 F A121 F err (5.20) 
Zp = 20 + A21Yt-1 + A2211 + ex (5.21) 

To distinguish between the systems represented by (5.17) and (5.18) versus (5.20) 
and (5.21), the first is called a structural VAR or the primitive system and the second 
is called a VAR in standard form. It is important to note that the error terms (i.e., e4; 
and €5,) are composites of the two shocks €,, and €,,. Since e, = B-'e,, we can compute 
e, and ez, as 

en = (Ey — bi2£4)/(1 — bi2b21) (5.22) 
ez = (Ex — ba) €y,)/C1 — bi2b21) (5.23) 

Since €,, and £, are white-noise processes, it follows that both e;, and e,, have 
zero means and constant variances and are individually serially uncorrelated. To find 
the properties of {e,,}, first take the expected value of (5.22): 

Bey, = El€y — by2€z)/( — bi2b21) = 0 
The variance of e4, is given by 
Ee, = Eley, — b£) /0 - byrby 
= (6, + b},02)/( — bbn) (5.24) 

Thus, the variance of e4, is time independent. The autocorrelations of e}, and e;,_; 

are 


Ee) ,e\-5 = Eley, = DiE Eyri — by €y-)1/A- biba) =0 fori#0 


Similarly, (5.23) can be used to demonstrate that e,, is a stationary process with 
zero mean, constant variance, and all autocovariances equal to zero. A critical point to 
note is that e,, and e,, are correlated. The covariance of the two terms is 


Eeen = Ele — Di € Ex — by 1€,)1/C = biba) 
= -(b107 + b202)/(1 = biba)? (5.25) 
In general, (5.25) will not be zero so that the two shocks will be correlated. In 
the special case where b,, = b); = 0 (i.e., if there are no contemporaneous effects of 


y, on z, and z, on y,), the shocks will be uncorrelated. It is useful to define the vari- 
ance/covariance matrix of the e}, and e5, shocks as 


var (€1;) cov (ey), ex) 
COV(e1r, €24) var (ez;) 
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Since all elements of X are time independent, we can use the more compact form 
o o 
gs | 1 J (5.26) 


2 
where var(e;,) = o7 and cov(e),, @;) = O12 = 021. 


Stability and Stationarity 


In the first-order autoregressive model y, = dg + a, y,_; + €;, the stability condition is 
that a, be less than unity in absolute value. There is a direct analog between this stability 
condition and the matrix A, in the first-order VAR model of (5.19). Using the brute force 
method to solve the system, iterate (5.19) backward to obtain 


x; = Áp +4; (Ao +AxX2 + e1) + ey 
= (I + A;)Ao + ATX; +Aje,_, +e 


where J = 2 x 2 identity matrix. 
After n iterations, 


n 
x, = (P+ A, +--+ +AAg + $ Aen +A Y a 
i=0 
If we continue to iterate backward, it is clear that convergence requires that the 

expression A’ vanish as n approaches infinity. As shown below, stability requires that 
the roots of (1 — a,;L)(1 — aL) — (aya; L°) lie outside the unit circle (the stability 
condition for higher-order systems is derived in Appendix 6.2 of Chapter 6). For the 
time being, if we assume that the stability condition is met, we can write the particular 
solution for x, as 


x= +}, Aien (5.27) 
i=0 


where „u = [y z] and 


y = [ayo — ay) + a12420]/4; Z = [a(l — a11) + a21410]/A 
A = (1 = ayp (d = a2) = 412đ721. 


If we take the expected value of (5.27), the unconditional mean of x, is u; hence, the 
unconditional means of y, and z, are y and z, respectively. The variances and covariances 
of y, and z, can be obtained as follows. First, form the variance/covariance matrix as 


TA 2 
Eœ, -u =E baa 


i=0 
Next, using (5.26) note that 


Ee? = H Jei ez] 
=> 
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Since Ee,e,_; = 0 for i # 0, it follows that 
E(x, — wy =(I +A} + Al FAR +E 
=[ -Aĵ 


where it is assumed that the stability condition holds, so A} approaches zero as n 
approaches infinity. 

If we can abstract from an initial condition, the {y,} and {z,} sequences will be 
jointly covariance stationary if the stability condition holds. Each sequence has a finite 
and time-invariant mean and a finite and time-invariant variance. 

In order to get another perspective on the stability condition, use lag operators to 
rewrite the VAR model of (5.20) and (5.21) as 


Yı = djo + ay Ly, + ajaz; + ey; 
Zi = gg + Gy Ly, + Ag Lz; + ez; 


or 
(1 = ay L)y, = a10 + ayn Lz, + err 
(1 = dg L)zZ, = oq + azı Lz, + ez 
If we use this last equation to solve for z,, it follows that Lz, is 
Lz, = L(a + ay, Ly, + €,)/C1 — aL) 
so that 


(1 = ay, L)y, = dio + aLla + ay Ly, + €9))/(1L — an L)] + e1; 


Notice that we have transformed the first-order VAR in the {y,} and {z,} sequences 
into a second-order stochastic difference equation in the {y,} sequence. Explicitly solv- 
ing for y,, we get 

y= aio(l = a22) + a12420 + (1 = ann bey, u (5.28) 
(1 -a L) = aL) — a242 L 


In the same manner, you should be able to demonstrate that the solution for z, is 


= ax (l — a11) + a21410 + (1 — ay Ley, + 494 ey -1 
i (1 = ay L)(1 = aL) = aaz L? 


(5.29) 


Equations (5.28) and (5.29) both have the same characteristic equation; con- 
vergence requires that the roots of the polynomial (1 — a,,L)(1 — ay)L) — a24 L° 
must lie outside the unit circle. (If you have forgotten the stability conditions for 
second-order difference equations, you might want to refresh your memory by 
reexamining Chapter 1.) Just as in any second-order difference equation, the roots 
may be real or complex and may be convergent or divergent. Notice that both y, and z, 
have the same characteristic equation; as long as a; and a,, do not both equal zero, 
the solutions for the two sequences have the same characteristic roots. Hence, both 
will exhibit similar time paths. 
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Dynamics of a VAR Model 


Figure 5.6 shows the time paths of four simple systems. For each system, 100 sets of 
normally distributed random numbers representing the {e,,} and {e,} sequences were 
drawn. The initial values of yọ and zy were set equal to zero, and the {y,} and {z,} 
sequences were constructed as in (5.20) and (5.21). Panel (a) uses the values: 


a19 = 499 = 0; aij = án = 0.7; and 443 = Gy, = 0.2 


When we substitute these values into (5.27), it is clear that the mean of each series 
is zero. From the quadratic formula, the two roots of the inverse characteristic equation 
(1 = ay,L)(1 — aL) — a24; L? are 1.111 and 2.0. Since both are outside the unit cir- 
cle, the system is stationary; the two characteristic roots of the solution for {y,} and 
{z,} are 0.9 and 0.5, respectively. Since these roots are positive, real, and less than 
unity, convergence will be direct. As you can see in the figure, there is a tendency for 
the sequences to move together. Since a); is positive, a large realization in y, induces 
a large realization of z,,; since a; is positive, a large realization of z, induces a large 
realization of y,,,. The cross-correlations between the two series are positive. 

Panel (b) illustrates a stationary process with ajọ = dog = 0, a); = aņ = 0.5, and 
a) = dy, = —0.2. Again, the mean of each series is zero, and the characteristic roots 
are 0.7 and 0.3. However, in contrast to the previous case, a); and a; are both negative 
so that positive realizations of y, can be associated with negative realizations of z,,, 
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FIGURE 5.6 Four VAR Processes 
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and vice versa. As can be seen from comparing the second panel, the two series appear 
to be negatively correlated. 

In contrast, Panel (c) shows a process possessing a unit root; here, aj; = 
an = 4,7 = ay, = 0.5. You should take a moment to find the characteristic roots. 
Undoubtedly, there is little tendency for either of the series to revert to a constant 
long-run value. Here, the intercept terms ajọ and dy, are equal to zero so that Panel 
(c) represents a multivariate generalization of the random walk model. You can see 
how the series seem to meander together. In Panel (d), the VAR process of Panel (c) 
also contains a nonzero intercept term (aj) = 0.5 and dy) = 0) that takes the role of a 
“drift.” As you can see from Panel (d), the two series appear to move closely together. 
The drift term adds a deterministic time trend to the nonstationary behavior of the 
two series. Combined with the unit characteristic root, the {y,} and {z,} sequences are 
joint random walk plus drift processes. Notice that the presence of the drift dominates 
the long-run behavior of the series. 


6. ESTIMATION AND IDENTIFICATION 


One explicit aim of the Box—Jenkins approach is to provide a methodology that leads to 
parsimonious models. The ultimate objective of making accurate short-term forecasts 
is best served by purging insignificant parameter estimates from the model. Sims’s 
(1980) criticisms of the “incredible identification restrictions” inherent in structural 
models argue for an alternative estimation strategy. Consider the following multivariate 
generalization of an autoregressive process: 


X; = Ag + AX HAX +++ +ASX_p + e (5.30) 


where x, = an (n X 1) vector containing each of the n variables included in the VAR 
Ag = an (n X 1) vector of intercept terms 

(n X n) matrices of coefficients 

= an (n X 1)vector of error terms. 


> 
| ll 


Sims’s methodology entails little more than a determination of the appropriate 
variables to include in the VAR and a determination of the appropriate lag length. The 
variables to be included in the VAR are selected according to the relevant economic 
model. Lag length tests (to be discussed below) select the appropriate lag length. Oth- 
erwise, no explicit attempt is made to “pare down” the number of parameter estimates. 
The matrix A, contains n parameters, and each matrix A; contains n? parameters; hence, 
n + pn? coefficients need to be estimated. Unquestionably, a VAR will be overparam- 
eterized in that many of these coefficient estimates will be insignificant. However, the 
goal is to find the important interrelationships among the variables. Improperly impos- 
ing zero restrictions may waste important information. Moreover, the regressors are 
likely to be highly collinear so that the t-tests on individual coefficients are not reliable 
guides for paring down the model. 

Note that the right-hand side of (5.30) contains only predetermined variables and 
that the error terms are assumed to be serially uncorrelated with constant variance. 
Hence, each equation in the system can be estimated using OLS. Moreover, OLS esti- 
mates are consistent and asymptotically efficient. Even though the errors are correlated 
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across equations, seemingly unrelated regressions (SUR) do not add to the efficiency of 
the estimation procedure since all regressions have identical right-hand side variables. 

There is an issue of whether the variables in a VAR need to be stationary. Sims 
(1980) and Sims, Stock, and Watson (1990) recommended against differencing even 
if the variables contain a unit root. They argued that the goal of a VAR analysis is to 
determine the interrelationships among the variables, not to determine the parameter 
estimates. The main argument against differencing is that it “throws away” informa- 
tion concerning the comovements in the data (such as the possibility of cointegrating 
relationships). Similarly, it is argued that the data need not be detrended. In a VAR, a 
trending variable will be well approximated by a unit root plus drift. However, the 
majority view is that the form of the variables in the VAR should mimic the true 
data-generating process. This is particularly true if the aim is to estimate a structural 
model. We return to these issues in Chapter 6; for now, it is assumed that all variables 
are stationary. Questions 9 and 10 at the end of this chapter ask you to compare a VAR 
in levels to a VAR in first differences. 


Forecasting 


Once the VAR has been estimated, it can be used as a multiequation forecasting model. 
Suppose you estimate the first-order model x, = Ap + A;x;_, + €, SO as to obtain the 
values of the coefficients in Ag and A}. If your data run through period T, it is straight- 
forward to obtain the one-step-ahead forecasts of your variables using the relationship 
ErXr41 = Ap + Axr. Similarly, a two-step-ahead forecast can be obtained recursively 
from E7x742. = Ag + Ay Epxp4, = Ag +A, [4o + Axr]. Nevertheless, in a higher-order 
model, there can be a large number of coefficient estimates. Since unrestricted VARs 
are overparameterized, the forecasts may be unreliable. In order to obtain a parsimo- 
nious model, many forecasters would purge the insignificant coefficients from the VAR. 
After reestimating the so-called near-VAR model using SUR, it could be used for fore- 
casting purposes. Others might use a Bayesian approach by combining a set of prior 
beliefs with the traditional VAR methods presented in this text. West and Harrison 
(1989) provided an approachable introduction to the Bayesian approach. Litterman 
(1980) proposed a sensible set of Bayesian priors that have become the standard in 
Bayesian VAR models. 

An interesting use of forecasting with a VAR is provided by the four-equation VAR 
of Eckstein and Tsiddon (2004). The aim of the study was to investigate the effects of 
terrorism (T) on the growth rates of Israeli real per capita GDP (AGDP,), investment 
(AZ), exports (AEXP,), and nondurable consumption (ANDC,). The authors use quar- 
terly data running from 1980Q1 to 2003Q3 so that there are 95 total observations. The 
measure of terrorism is a weighted average of the number of Israeli fatalities, injuries, 
and noncasualty incidents due to both domestic and transnational attacks occurring in 
Israel. Consider a simplified version of their VAR model: 


AGDP AGDP,_ CT, e 
ar | [4u@ .. a Get] [oa a 
t |- : 2 : t- t- t 
TE E N |] SEXP meril A ie 
ANDC, s ce es ANDC, | (64T, Cap 
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where the expressions A,(L) are polynomials in the lag operator L, the c; measure the 
influence of lagged terrorism on variable i, and the e; are the regression errors. The 
other right-hand side variables (not shown) are the first difference of the real interest 
rate, three quarterly seasonal dummies, and an intercept. 

The nature of the VAR is such that AGDP,, AI,, AEXP,, and ANDC, are all jointly 
determined. In contrast, the terrorism variable acts as an independent variable in the 
system. Notice that the magnitude of T,_; is allowed to affect the four macroeconomic 
variables, but there is no feedback from these variables to the level of terrorism. The 
authors report that lagging the terrorism variable for a single period provided a better 
fit than the use of other lag lengths. 

The four equations of the model were estimated through 2003Q3 and used to 
obtain 1 through 12-step-ahead forecasts of AGDP,, AIL, AEXP,, and ANDC,. Unlike 
forecasting with a pure VAR (in which all variables are jointly determined), it was 
necessary for Eckstein and Tsiddon (2004) to specify the time path of the terrorism vari- 
able. Consider the VAR representation of their model x, = Ay + A,x,_; + cT,_) +e, 
where c is the 4 x 1 vector [c], C2, C3, €4]'. The one-step-ahead forecast is Epxp4) = 
Ap + A,x7 + cTr, and two-step-ahead forecast is ErXr42 = Ag + A Er[Xr41 + cT p41]. 
Hence, in order to forecast the values of x7, and beyond, it is necessary to know the 
magnitude of the terrorism variable over the forecast period. Toward this end, Eckstein 
and Tsiddon supposed that all terrorism actually ended in 200304 (so that all values 
of T, = 0 for j > 200304. Under this assumption, the annual growth rate of GDP was 
estimated to be 2.5% through 2005Q3. Instead, when they set the values of 7; at the 
200004-200304 period average, the growth rate of GDP was estimated to be zero. 
Thus, a steady level of terrorism would have cost the Israeli economy all of its real out- 
put gains. In actuality, the largest influence of terrorism was found to be on investment. 
The impact of terrorism on investment was twice as large as the impact on real GDP. 


Identification 


Suppose that you want to recover the structural VAR from your estimate of the 
model in standard form. To illustrate the identification procedure, return to the 
two-variable/first-order VAR of the previous section. Due to the feedback inherent in 
a VAR process, the primitive equations (5.17) and (5.18) cannot be estimated directly. 
The reason is that z, is correlated with the error term €e, and that y, is correlated 
with the error term €,,. Standard estimation techniques require that the regressors be 
uncorrelated with the error term. Note that there is no such problem in estimating the 
VAR system in the form of (5.20) and (5.21). OLS can provide estimates of the two 
elements of Ap and the four elements of A,. Moreover, obtaining the residuals from 
the two regressions, it is possible to calculate estimates of the variance of e,, e2,, and 
the covariance between e,, with e>,. The issue is whether it is possible to recover all 
of the information present in the primitive system given by (5.17) and (5.18). In other 
words, is the primitive system identifiable given the OLS estimates of the VAR model 
in the form of (5.20) and (5.21)? 

The answer to this question is, “No, unless we are willing to appropriately restrict 
the primitive system.” The reason is clear if we compare the number of parameters 
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of the primitive system with the number of parameters recovered from the estimated 
VAR model. Estimating (5.20) and (5.21) yields six coefficient estimates (ajọ, 420, 
411; 412, 421, and ayy) and the calculated values of var(e,,), var(ez,), and cov(e},, €z). 
However, the primitive system (5.17) and (5.18) contains 10 parameters. In addition to 
the two intercept coefficients b;, and by), the four autoregressive coefficients y,,, 717, 
¥21, and >, and the two feedback coefficients b> and b>), there are the two standard 
deviations o, and o,. In all, the primitive system contains 10 parameters, whereas the 
VAR estimation yields only 9 parameters. Unless one is willing to restrict one of the 
parameters, it is not possible to identify the primitive system; equations (5.17) and 
(5.18) are underidentified. 

One way to identify the model is to use the type of recursive system proposed by 
Sims (1980). Suppose that you are willing to impose a restriction on the primitive sys- 
tem such that the coefficient b}; is equal to zero. Of course, forcing b); = 0 imposes an 
asymmetry on the system in that z, has a contemporaneous effect on y, but y, affects the 
{z,} sequence with a one-period lag. Nevertheless, it should be clear that this restric- 
tion (which might be suggested by a particular economic model) results in an exactly 
identified system. Writing (5.22) and (5.23) with the constraint imposed yields 


Cji = Ey — DiE 
En = Ex 
so that 
var(e,) = 0, + bi o? (5.31) 
var(ez) = 02 (5.32) 
cov(e), e2) = -b207 (5.33) 


Equations (5.32) through (5.33) consist of three equations in three unknowns. 
Since the estimated variance/covariance matrix, Ł, contains var(e,), var(e,), and 


cov(e}, e2), the values of b,, 67, and o? can be identified recursively as o? = var(e ), 
bi) = —cov(ey,€y)/o2, and oF = var(e;) — b? ož. To put matters another way, 


imposing the constraint means that the primitive system of (5.17) and (5.18) is 


given by 
1 bij |y: = bio $ Vii ASi =r + Eyt 
0 1l Zr boo Yai 22) [2-1 Ezt 
Now, premultiplication of the primitive system by B7! yields 
H Z b i Ha j t cia j i =] + i ca H 
Zt 0 1 boo 0 1 Y21 Yaj [2-1 0 1 Ex 


H — ii -= a J i =f. Yn- m be + a l 
by 721 722 {1 En 


Estimating the system using OLS yields the parameter estimates from: 


or 


Yi = Q10 F Ay F A121 F Cy 
Zt = 20 F A21Yt-1 F 472%] F ezr 
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where a10 = bio — bi2b20 411 = 711 Z b12%21; 412 = Ni2 Z 12¥22» 420 = bao; 421 = Yar» 
and dy) = 79. 

Along with (5.31) through (5.33), we have nine parameter estimates 410, 411, 412 
9, 421, A22, Var(e,), var(ez), and cov(e,,é,), which can be substituted into the nine 
equations above in order to simultaneously solve for b19, b12, ¥115 Y12 9205 V21 Y205 Oo 
and o?. l 

Note also that the estimates of the {e,,} and {€,,} sequences can be recovered. 
The residuals from the second equation (i.e., the {e>,} Sequence) are estimates of the 
{€} sequence. Combining these estimates along with the solution for b,, allows us to 
calculate the estimates of the {e,,} sequence using the relationship e,, = €,, — D).€,,. 

Take a minute to examine the restriction. The assumption b); = 0 means that y, 
does not have a contemporaneous effect on z,. The restriction manifests itself such that 
both €,, and €,, shocks affect the contemporaneous value of y, but only €,, shocks affect 
the contemporaneous value of zı. The observed values of e>, are completely attributed to 
pure shocks to the {z,} sequence. Decomposing the residuals in this triangular fashion 
is called a Choleski decomposition. 

In fact, the result is quite general. In an n-variable VAR, B is an n x n matrix since 
there are n regression residuals and n structural shocks. As shown in Section 10, exact 
identification requires that (n? — n)/2 restrictions be placed on the relationship between 
the regression residuals and the structural innovations. Since the Choleski decomposi- 
tion is triangular, it forces exactly (n? — n)/2 values of the B matrix to equal zero. 


7. THE IMPULSE RESPONSE FUNCTION 


Just as an autoregression has a moving average representation, a vector autoregression 
can be written as a vector moving average (VMA). In fact, equation (5.27) is the VMA 
representation of (5.19) in that the variables (i.e., y, and z,) are expressed in terms 
of the current and past values of the two types of shocks (i.e., e}, and e,,). The VMA 
representation is an essential feature of Sims’s (1980) methodology in that it allows you 
to trace out the time path of the various shocks on the variables contained in the VAR 
system. For illustrative purposes, continue to use the two-variable/first-order model 
analyzed in the previous two sections. Writing the two-variable VAR in matrix form, 


[z = ee + ie | | p H (5.34) 
Zt a20 42, an| [2-1 et 


or, using (5.27), we get 
Yı y ` a&i] al elti 
= fs] + (5.35) 
H p 2 f A Eal 


Equation (5.35) expresses y, and z, in terms of the {e,,} and {e,,} sequences. How- 
ever, it is insightful to rewrite (5.35) in terms of the {€,,} and {€,,} sequences. From 
(5.22) and (5.23), the vector of errors can be written as 


aoii efe = 
H 1 — dyybo, ES 1 Ex 


THE IMPULSE RESPONSE FUNCTION 295 


so that (5.35) and (5.36) can be combined to form 


yh 1 S fan sa) | 1 “| Ea 
= |Z} + ——_ 
H p 1 — bi2b21 2 b an| [-ba, 1 | [Ezi 


Since the notation is getting unwieldy, we can simplify by defining the 2 x 2 matrix 
$ with elements $j, (i): 


TOPE | I zu 
| 1- bpb [b2 1 
Hence, the moving average representation of (5.35) and (5.36) can be written in 
terms of the lEn) and {€,,} sequences: 


vy] Pla S nD AO] fey-i 
K 7 H a 2 ae Hee) Pa 
or, more compactly, 


x, = H+ Y}, pEi (5.37) 
i=0 


The moving average representation is an especially useful tool to examine the 
interaction between the {y,} and {z,} sequences. The coefficients of h; can be used 
to generate the effects of €,, and €,, shocks on the entire time paths of the {y,} and 
{z,} sequences. If you understand the notation, it should be clear that the four elements 
;,(0) are impact multipliers. For example, the coefficient @,,(0) is the instantaneous 
impact of a one-unit change in €,, on y,. In the same way, the elements #,,(1) and 
pı2(1) are the one-period responses of unit changes ineé,, , andé,,_, on y,, respectively. 
Updating by one period indicates that @,,(1) and $,5(1) also represent the effects of 
unit changes in €,, and £; ON y,41. 

The accumulated effects of unit impulses in €,, and/or €., can be obtained by 
the appropriate summation of the coefficients of the impulse response functions. For 
example, note that, after n periods, the effect of €,, on the value of y,,,, is 12(7). Thus, 
after n periods, the cumulated sum of the effects of €,, on the {y,} sequence is 


£X Pi2(i) 
=D 


Letting n approach infinity yields the total cumulated effect. If the {y,} and {z,} 
sequences are assumed to be stationary, it must be the case that for all j and k, the 
values of bi (i) converge to zero as i gets large. This follows as shocks cannot have a 
permanent effect on a stationary series. It also follows that 


> $;,(i) is finite 
i=0 
The four sets of coefficients h4; (i), $1201), 62; (È), and @>5(7) are called the impulse 
response functions. Plotting the impulse response functions [i.e., plotting the coef- 
ficients of (i) against i] is a practical way to visually represent the behavior of 
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the {y,} and {z,} series in response to the various shocks. In principle, it might be 
possible to know all of the parameters of the primitive system (5.17) and (5.18). With 
such knowledge, it would be possible to trace out the time paths of the effects of pure 
E or €,, shocks. However, this methodology is not available to the researcher since 
an estimated VAR is underidentified. As explained in the previous section, knowledge 
of the various a;; and the variance/covariance matrix & is not sufficient to identify the 
primitive system. Hence, the econometrician must impose an additional restriction on 
the two-variable VAR system in order to identify the impulse responses. 

One possible identification restriction is to impose the recursive ordering (or 
Choleski decomposition) used in (5.31), such that y, does not have a contemporaneous 
effect on z,. Formally, this restriction is represented by setting b,, = 0 in the primitive 
system. In terms of (5.36), the error terms can be decomposed as follows: 


Cir = Eyt — Di €4 (5.38) 
En = Ext (5.39) 


As already noted, if we use (5.39), all of the observed errors from the {e,,} 
sequence are attributed to €,, shocks. Given the calculated {€,,} sequence, knowledge 
of the values of the {e,,} sequence and the correlation coefficient between e4, and ez, 
allows for the calculation of the {€,,} sequence using (5.38). Although this Choleski 
decomposition constrains the system such that an E€ shock has no direct effect on z,, 
there is an indirect effect in that lagged values of y, affect the contemporaneous value 
of z,. The key point is that the decomposition forces a potentially important asymmetry 
on the system since an €,, shock has contemporaneous effects on both y, and z,. For 
this reason, (5.38) and (5.39) are said to be an ordering of the variables. An €,, shock 
directly affects e}, and e,,, but an €,, shock does not affect e,,. Hence, z, is said to be 
“causally prior” to y,. 

Suppose that estimates of equations (5.20) and (5.21) yield the values 
āio = a = 9, Ay, = Go, = 0.7, and ayy = ay, = 0.2. You will recall that this is 
precisely the model used in the simulation reported in Panel (a) of Figure 5.6. Also, 
suppose that the elements of the 2 matrix are such that o? = o and that cov(e4;, €2;) is 
such that the correlation coefficient between e}, and e,, (denoted by p,,) is 0.8. Hence, 
the decomposed errors can be represented by* 


C1, = Ey, + 0.8E, (5.40) 
Cor = Ext (5.41) 


Panels (a) and (b) of Figure 5.7 trace out the effects of one-unit shocks to €,, and Eyt 
on the time paths of the {y,} and {z,} sequences. As shown in Panel (a), a one-unit shock 
in €,, Causes z, to jump by one unit and y, to jump by 0.8 units. [From (5.40), 80% of the 
€E, shock has a contemporaneous effect on e,,.] In the next period, €,,, ; returns to zero, 
but the autoregressive nature of the system is such that y,,, and z,,, do not immediately 
return to their long-run values. Since z,,; = 0.2y, + 0.72, + €,441, it follows that z; = 
0.86 [0.2(0.8) + 0.7(1) = 0.86]. Similarly, y,,, = 0.7y, + 0.2z, = 0.76. As you can see 
from the figure, the subsequent values of the {y,} and {z,} sequences converge to their 
long-run levels. This convergence is assured by the stability of the system; as found 
earlier, the two characteristic roots are 0.5 and 0.9. 
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Legend: Solid line = {y;} sequence Cross-hatch = {z;} sequence. 
Note: In all cases, e1; = 0.8e2; + ey; and e2 = Ezt, 


FIGURE 5.7 Two Impulse Response Functions 


The effects of a one-unit shock in Ey, are shown in Panel (b) of the figure. You 
can see the asymmetry of the decomposition immediately by comparing the two upper 
graphs. A one-unit shock in €,, causes the value of y, to increase by one unit; however, 
there is no contemporaneous effect on the value of z, so that y, = 1 and z, = 0. In the 
subsequent period, €,,, returns to zero. The autoregressive nature of the system is such 
that y,,, = 0.7y, + 0.2z, = 0.7 and z; = 0.2y, + 0.7z, = 0.2. The remaining points in 
the figure are the impulse responses for periods t + 2 through ¢ + 20. Since the system 
is stationary, the impulse responses ultimately decay. 

Can you figure out the consequences of reversing the Choleski decomposition in 
such a way that b,5, rather than b,,, is constrained to equal zero? Since matrix A, is 
symmetric (i.€., @,; = dy, and a; = d>,), the impulse responses of an €,, shock would 
be similar to those in Panel (a) and the impulse responses of an £, would be similar to 
those in Panel (b). The only difference would be that the solid line would represent the 
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time path of the {z,} sequence and the hatched line would represent the time path of 
the {y,} sequence. 

As a practical matter, how does the researcher decide which of the alternative 
decompositions is most appropriate? In some instances, there might be a theoretical 
reason to suppose that one variable has no contemporaneous effect on the other. In the 
terrorism/tourism example, knowledge that terrorist incidents affect tourism with a lag 
suggests that terrorism does not have a contemporaneous effect on tourism. Usually, 
there is no such a priori knowledge. Moreover, the very idea of imposing a structure 
on a VAR system seems contrary to the spirit of Sims’s argument against “incredible 
identifying restrictions.” Unfortunately, there is no simple way to circumvent the prob- 
lem; identification necessitates imposing some structure on the system. The Choleski 
decomposition provides a minimal set of assumptions that can be used to identify the 
structural model. 

It is crucial to note that the importance of the ordering depends on the magni- 
tude of the correlation coefficient between e,, and e,,. Let this correlation coefficient 
be denoted by p}; so that p12 = 0;/(0,;0). Now suppose that the estimated model 
yields a value of X such that p,, is found to be equal to zero. In this circumstance, 
the ordering would be immaterial. Formally, (5.38) and (5.39) become e,, = €,, and 
ez, = €,,. Since there is no correlation across equations, the residuals from the y, and 
z, equations are necessarily equivalent to the €,, and €,, shocks, respectively. The point 
is that if Ee,,e), = 0, b, and b}; can both be set equal to zero. At the other extreme, if 
P12 is found to be unity, there is a single shock that contemporarily affects both vari- 
ables. When p,, = 0 and maintaining the assumption b,, = 0, (5.38) and (5.39) become 
eit = Ey and ey, = E, and under the alternative assumption bız = 0, it follows that 
eit = E and ep, = €,,. Usually, the researcher will want to test the significance of p15. 
As in a univariate model, you can test the null hypothesis p,, = 0 using a normal distri- 
bution with a mean of zero and a standard deviation of T~°>. As such, with 100 usable 
observations, if |p,2| > 0.2, the correlation is deemed to be significant at conventional 
levels. If p;5 is significant, the usual procedure is to obtain the impulse response func- 
tion using a particular ordering. Compare the results to the impulse response function 
obtained by reversing the ordering. If the implications are quite different, additional 
investigation into the relationships between the variables is necessary. 

The lower half of Figure 5.7, Panels (c) and (d), show the impulse response 
functions for a second model; the sole difference between models | and 2 is the change 
in the values of a,, and a}; to —0.2. Notice that this model was used in the simulation 
reported in Panel (b) of Figure 5.6. The negative off-diagonal elements of A} weaken 
the tendency for the two series to move together. Panel (c) traces out the effect of 
a one-unit €,, shock using ordering represented by (5.40) and (5.41). In period t, z, 
rises by one unit and y, rises by 0.8 units. In period (t+ 1), €,,,, returns to zero but 
the value of y,,, is 0.7y, — 0.2z, = 0.36 and the value of z,,, is —0.2y, + 0.7z, = 0.54. 
The points represented by tf = 2 through 20 show that the impulse responses converge 
to zero. Panel (d) traces the responses of a one-unit €,, shock. Since the value 
of z, is unaffected by the shock, in period (f+ 1), y,,, = 0.7y, —0.2z, = 0.7 and 
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Z1 = —0.2y, + 0.7z, = —0.2. In the same way, y,,. = 0.7 * 0.7 — 0.2 x (—0.2) = 0.53 
and Z2 = —0.2 x (0.7) + 0.7 x (—0.2) = —0.28. Since the system is stable, both 
sequences eventually converge to zero. 


Confidence Intervals and Impulse Responses 


One key issue concerning the impulse response functions is that they are constructed 
using the estimated coefficients. Since each coefficient is estimated imprecisely, the 
impulse responses also contain error. The issue is to construct confidence intervals 
around the impulse responses that allow for the parameter uncertainty inherent in the 
estimation process. To illustrate the methodology, consider the following estimate of 
an AR(1) model: 


y, = 0.60y,_) + €; 
(4.00) 


Given the t-statistic of 4.00, the AR(1) coefficient seems to be well estimated. It is 
easy to form the impulse response function: For any given level of y,_,, a one-unit shock 
to e, will increase y, by one unit. In subsequent periods, y,,, will be 0.60 and y,,5 will 
be (0.60). As you can easily verify, the impulse response function can be written as 
(i) = (0.607. 

Notice that the point estimate of the AR(1) coefficient is 0.6 with a standard devi- 
ation of 0.15 (0.15 = 0.60/4.00). If we are willing to assume that the coefficient is 
normally distributed, there is a 95% chance that the actual value lies within the two 
standard deviation interval 0.3—0.9. As such, the decay pattern could be anywhere 
between (i) = (0.90) and (i) = (0.30)'. The problem is much more complicated in 
higher-order systems since the estimated coefficients will be correlated. Moreover, you 
may not want to assume normality. One way to obtain the desired confidence intervals 
from the AR(p) process y, = dg + 4) y;_1 + + + G,Y;_p + £; is to perform the following 
Monte Carlo study: 


1. Estimate the coefficients ay through a, using OLS and save the residuals. 
Let â; denote the estimated value of a; and let {€;} denote the estimated 
residuals. 

2. For a sample size of T, draw T random numbers so as to represent the {€,} 
sequence. Most software packages will draw the numbers using randomly 
selected values of ê; (with replacement). In this way, they actually gener- 
ate bootstrap confidence intervals. Thus, you will have a simulated series 
of length T, called £f, which should have the same properties as the true 
error process. Use these random numbers to construct the simulated {y} } 
sequence as 

y = ao + ay Pere a T E 


Be sure that you appropriately initialize the series so as to eliminate the 
effects of the initial conditions. 
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3. Now act as if you did not know the coefficient values used to generate the 
y; series. Estimate yf as an AR(p) process and obtain the impulse response 
function. If you repeat the process several thousand times, you can generate 
several thousand impulse response functions. You use these impulse response 
functions to construct the confidence intervals. For example, you can con- 
struct the interval that excludes the lowest 2.5% and highest 2.5% of the 
responses to obtain a 95% confidence interval. 


The benefit of this method is that you do not need to make any special assumptions 
concerning the distribution of the autoregressive coefficients. The actual calculation of 
confidence intervals is only a bit more complicated in a VAR. Consider the two-variable 
system: 


Yi = Vp F 2% 1 + er 
Zt = Ag, Yy-1 F 492%) F Cry 


The complicating issue is that the regression residuals are correlated. As such, you 
need to draw e,, and e, so as to maintain the appropriate error structure. A simple 
method is to draw e,, and use the value of e,, that corresponds to that same period. If 
you use a Choleski decomposition such that b,, = 0, construct €,, and €,, using (5.38) 
and (5.39). Figure 5.8 reports confidence intervals from a two-variable VAR that has 
been estimated using the domestic and transnational terrorism data shown in Figure 5.1. 
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You can see that the responses of domestic terrorism to transnational shocks are never 
significant. 


Variance Decomposition 


Another useful aid in uncovering interrelationships among the variables in the system 
is a forecast error variance decomposition. Suppose that we knew the coefficients of 
Ag and A, and wanted to forecast the various values of x,,; conditional on the observed 
value of x,. Updating (5.19) one period (i.e., x4; = Ap + A,X, + €,,1) and taking the 
conditional expectation of x,,,;, we obtain 
EX1 = Ag + Aix; 

Note that the one-step-ahead forecast error is X}; — E,X,,1 = €,4,- Similarly, 

updating two periods, we get 
X42 = Ao FAIX + Era 
= Áp +A (áo + A,X, + ĉii) + C142 
If we take conditional expectations, the two-step-ahead forecast of x,, is 


EX2 = (I +A1)Ao + ATR, 


The two-step-ahead forecast error (i.e., the difference between the realization of 
X,42 and the forecast) is e,,, +Aj,e,,,. More generally, it is easily verified that the 
n-step-ahead forecast is 


E Xuan = (+A, +A? +- + ATDA + Al, 
and that the associated forecast error is 
2 =j 
ĉt+n +A enn-i +A ern- Pa +A; Cr+ (5.42) 


We can also consider these forecast errors in terms of (5.37) (i.e., the VMA form 
of the structural model). Of course, the VMA and the VAR models contain exactly the 
same information but it is convenient (and a good exercise) to describe the properties 
of the forecast errors in terms of the {€,} sequence. If we use (5.37) to conditionally 
forecast x,,,, one step ahead the forecast error is @p€,,,. In general, 


co 
Xt4n =H + 2 PiEttn-i 
ZO 


so that the n-period forecast error x,,,, — E,X;4, 18 


n-1 


Xt+n T E Xin = £ PiEttn-i 
=O 


Focusing solely on the {y,} sequence, we see that the n-step-ahead forecast error 
is 
Yt+n — E Ytan = Pi OEsr4n ji Pir VE yn Pree p(n = 1)Ey41 
+ Pix €s4n + Pr2De nr + + iln — DE 41 
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Denote the n-step-ahead forecast error variance of y,,,, as 0,(”)": 


on)? = AUNO +u? +-+- 1] 
+ oF lpn 0? + pa H pan- 1)7] 
Because all values of VRON are necessarily nonnegative, the variance of the fore- 
cast error increases as the forecast horizon n increases. Note that it is possible to decom- 


pose the n-step-ahead forecast error variance into the proportions due to each shock. 
The proportions of o,(7)* due to shocks in the {£} and {€,,} sequences are 


01h, (0)? + pu (D? +e +h 1)7] 


and 
o2lbi20) + pn? +--+ b= D] 
o,(n)? 
respectively. 


The forecast error variance decomposition tells us the proportion of the move- 
ments in a sequence due to its “own” shocks versus shocks to the other variable. If 
E Shocks explain none of the forecast error variance of {y,} at all forecast horizons, 
we can say that the {y,} sequence is exogeneous. In this circumstance, {y,} evolves 
independently of the €,, shocks and the {z,} sequence. At the other extreme, €,, shocks 
could explain all of the forecast error variance in the {y,} sequence at all forecast hori- 
zons, so that {y,} would be entirely endogeneous. In applied research, it is typical for a 
variable to explain almost all of its forecast error variance at short horizons and smaller 
proportions at longer horizons. We would expect this pattern if €,, shocks had little 
contemporaneous effect on y, but acted to affect the {y,} sequence with a lag. 

Note that the variance decomposition contains the same problem inherent in 
impulse response function analysis. In order to identify the {e,,} and {€,,} sequences, 
it is necessary to restrict the B matrix. The Choleski decomposition used in (5.38) and 
(5.39) necessitates that all of the one-period forecast error variance of z, is due to €.,. 
If we use the alternative ordering, all of the one-period forecast error variance of y, 
would be due to €,,. The effects of these alternative assumptions are reduced at longer 
forecasting horizons. In practice, it is useful to examine the variance decompositions 
at various forecast horizons. As n increases, the variance decompositions should 
converge. Moreover, if the correlation coefficient p; is significantly different from 
zero, it is customary to obtain the variance decompositions under various orderings. 

Nevertheless, impulse analysis and variance decompositions (together called inno- 
vation accounting) can be useful tools to examine the relationships among economic 
variables. If the correlations among the various innovations are small, the identification 
problem is not likely to be especially important. The alternative orderings should yield 
similar impulse responses and variance decompositions. Of course, the contemporane- 
ous movements of many economic variables are highly correlated. Sections 10-13 
consider two attractive methods that can be used to identify the structural innova- 
tions. Before examining these techniques, we consider hypothesis testing in a VAR 
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framework and reexamine the interrelationships between domestic and transnational 
terrorism. 


8. TESTING HYPOTHESES 


In principle, there is nothing to prevent you from incorporating a large number of vari- 
ables in the VAR. It is possible to construct an n-equation VAR with each equation 
containing p lags of all n variables in the system. You will want to include those vari- 
ables that have important economic effects on each other. As a practical matter, degrees 
of freedom are quickly eroded as more variables are included. For example, using 
monthly data with 12 lags, the inclusion of one additional variable uses an additional 12 
degrees of freedom in each equation. A careful examination of the relevant theoretical 
model will help you to select the set of variables to include in your VAR model. 
An n-equation VAR can be represented by 


Xit Aio Ay (L) AD) > AnD || Xii Eit 
Bord pe Axo i. AnD) AnaL) > Aan) || X21 g| (5.43) 


x, Ano Ant (L) Ano (L) > An (L) Xnt-1 e 


nt nt 


where Aj = the parameters representing intercept terms 
Aj(L) = the polynomials in the lag operator L. 


The individual coefficients of A;(L) are denoted by a(l), a,(2), .... Since all 
equations have the same lag length, the polynomials A,(L) are all of the same degree. 
The terms e, are white-noise disturbances that may be correlated with each other. 
Again, designate the variance/covariance matrix by &, where the dimension of È is 
(nxn). 

In addition to the determination of the set of variables to include in the VAR, it is 
important to determine the appropriate lag length. One possible procedure is to allow 
for different lag lengths for each variable in each equation. However, in order to pre- 
serve the symmetry of the system (and to be able to use OLS efficiently), it is common 
to use the same lag length for all equations. As indicated in Section 6, as long as there 
are identical regressors in each equation, OLS estimates are consistent and asymptoti- 
cally efficient. If some of the VAR equations have regressors not included in the others, 
seemingly unrelated regressions (SUR) provide efficient estimates of the VAR coeffi- 
cients. Hence, when there is a good reason to let lag lengths differ across equations, 
estimate the so-called near-VAR using SUR. 

In a VAR, long-lag lengths quickly consume degrees of freedom. If lag length is 
p, each of the n equations contains np coefficients plus the intercept term. Appropri- 
ate lag length selection can be critical. If p is too small, the model is misspecified; 
if p is too large, degrees of freedom are wasted. To check lag length, begin with the 
longest plausible length or the longest feasible length given degrees-of-freedom con- 
siderations. Estimate the VAR and form the variance/covariance matrix of the resid- 
uals. Using quarterly data, you might start with a lag length of 12 quarters based on 
the a priori notion that 3 years is sufficiently long to capture the system’s dynamics. 
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Call the variance/covariance matrix of the residuals from the 12-lag model 2,,. Now 
suppose you want to determine whether eight lags are appropriate. After all, restricting 
the model from 12 to 8 lags would reduce the number of estimated parameters by 4n 
in each equation. 

Since the goal is to determine whether lag 8 is appropriate for all equations, an 
equation by equation F-test on lags 9 through 12 is not appropriate. Instead, the proper 
test for this cross-equation restriction is a likelihood ratio test. Reestimate the VAR 
over the same sample period using eight lags and obtain the variance/covariance matrix 
of the residuals £g. Note that Xg pertains to a system of n equations with 4n restrictions 
in each equation, for a total of 4n? restrictions. The likelihood ratio statistic is 


(T)Un|Zg| — In|Z,>]) 


However, given the sample sizes usually found in economic analysis, Sims (1980) 
recommended using 
(T = o) (In|Zs| = In |2121) 


where T is number of usable observations, c the number of parameters estimated in each 
equation of the unrestricted system, and 1In|£,„| = the natural logarithm of the determi- 
nant of &,,. 

In the example at hand, c = 1 + 12n since each equation of the unrestricted model 
has 12 lags for each variable plus an intercept. 

This statistic has an asymptotic y? distribution with degrees of freedom equal to 
the number of restrictions in the system. In the example under consideration, there are 
4n restrictions in each equation, for a total of 4n restrictions in the system. Clearly, if 
the restriction of a reduced number of lags is not binding, we would expect In |X| to 
be equal to In |X,,|. Large values of this sample statistic indicate that having only eight 
lags is a binding restriction; hence, we can reject the null hypothesis that lag length = 8. 
If the calculated value of the statistic is less than y at a prespecified significance level, 
we will not be able to reject the null of only eight lags. At that point, we could seek to 
determine whether four lags were appropriate by constructing 


(T — c)(n |X4| — In |X1) 


Considerable care should be taken in paring down lag length in this manner. Often, 
this procedure will not reject the null hypotheses of 8 versus 12 lags and 4 versus 8 
lags, although it will reject a null of 4 versus 12 lags. The problem with paring down 
the model is that you may lose a small amount of explanatory power at each stage. 
Overall, the total loss in explanatory power can be significant. In such circumstances, 
it is better to use the longer lag lengths. 

This type of likelihood ratio test is applicable to any type of cross-equation 
restriction. Let X,, and &,. be the variance/covariance matrices of the unrestricted and 
restricted systems, respectively. If the equations of the unrestricted model contain 
different regressors, let c denote the maximum number of regressors contained in the 
longest equation. Sims’s recommendation is to compare the test statistic 


(T — c)n |%,| — In |2,|) (5.44) 


TESTING HYPOTHESES 305 


to a y? distribution with degrees of freedom equal to the number of restrictions in the 
system. 

To take another example, suppose you wanted to capture seasonal effects by includ- 
ing three seasonal dummies in each of the n equations of a VAR. Estimate the unre- 
stricted model by including the dummy variables and estimate the restricted model 
by excluding the dummies. The total number of restrictions in the system is 3n. If 
lag length is p, the equations of the unrestricted model have np + 4 parameters (np 
lagged variables, the intercept, and the three seasonals). For T usable observations, set 
c = np + 4 and calculate the value of (5.44). If for some prespecified significance level 
this calculated value y? (with 3n degrees of freedom) exceeds the critical value, the 
restriction of no seasonal effects can be rejected. 

The likelihood ratio test is based on asymptotic theory, which may not be very 
useful in the small samples available to time-series econometricians. Moreover, the 
likelihood ratio test is only applicable when one model is a restricted version of the 
other. Alternative test criteria are the multivariate generalizations of the AIC and SBC: 


AIC =T In |Z] + 2N 
SBC =T 1n |E] +N In(T) 


where |X| = determinant of the variance/covariance matrix of the residuals 


N = total number of parameter sestimated in all equations. 


Thus, if each equation in an n-variable VAR has p lags and an intercept, 
N = n?p +n; each of the n equations has np lagged regressors and an intercept. 

Adding additional regressors will reduce In |X| at the expense of increasing N. 
As in the univariate case, select the model with the lowest AIC or SBC value. Make 
sure that you adequately compare the models using the same number of observations 
in each. Note that the multivariate AIC and SBC cannot be used to fest the statistical 
significance of alternative models. Instead, they are measures of the overall fit of the 
alternatives. As in the univariate case, there are a number of ways that researchers and 
software packages use to report the multivariate generalizations of the AIC and SBC. 
Often, these values will be reported as 


AIC* = —2In(L)/T + 2N/T 
SBC* = —2In(L)/T + N In(T)/T 


where L = maximized value of the multivariate log likelihood function. Note that some 
packages will omit the T in the denominator. 


Granger Causality 


One test of causality is whether the lags of one variable enter into the equation for 
another variable. In a two-equation model with p lags, {y,} does not Granger cause 
{z,} if and only if all of the coefficients of A); (L) are equal to zero. Thus, if {y,} does 
not improve the forecasting performance of {z,}, then {y,} does not Granger cause {z,}. 
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If all variables in the VAR are stationary, the direct way to test Granger causality is to 
use a standard F-test of the restriction 


ay, (1) = a (2) = a, (3) = ... = ay (p) = 0 


It is straightforward to generalize this notion to the n-variable case of (5.43). Since 
A;;(L) represents the coefficients of lagged values of variable j on variable i, variable j 
does not Granger cause variable i if all the coefficients of the polynomial A;(L) can be 
set equal to zero. 

Note that Granger causality is something quite different from a test for exogeneity. 
For z, to be exogeneous, we would require that it not be affected by the contempora- 
neous value of y,. However, Granger causality refers only to the effects of past values 
of {y,} on the current value of z,. Hence, Granger causality actually measures whether 
current and past values of {y,} help to forecast future values of {z,}. To illustrate the 
distinction in terms of a VMA model, consider the following equation such that y, does 
not Granger cause z, yet z, is not exogeneous 


Z =Z + hye, + >, by DE 4; 
i=0 


If we forecast z,,, conditional on the values of the €,,_;(i = 0,1, ...) alone, we 
obtain the forecast error h21 (0)E p41 + b22(O)E.4,- Yet, we get the same forecast error 
if we forecast z,,; conditional on Exi and €,, (i = 0, 1, ...). Given the value of z,, 
information concerning y, does not aid in reducing the forecast error for z,,_,. In other 
words, for the model under consideration, E,(z,,.;|Z,) = E,(Z,41|2»,). Thus, {y,} does 
not Granger cause {z,}. On the other hand, since we are assuming that ,,(0) is not 
zero, {z,} is not exogenous. Clearly, if 5,(0) is not zero, pure shocks to y,,, (i.e., 
E41) affect the value of z,,, even though the {y,} sequence does not Granger cause 
the {z,} sequence. 

A block-exogeneity test is useful for detecting whether to incorporate an addi- 
tional variable into a VAR. Given the aforementioned distinction between causality 
and exogeneity, this multivariate generalization of the Granger causality test should 
actually be called a block-causality test. In any event, the issue is to determine whether 
lags of one variable—say w,—Granger cause any other of the variables in the system. 
In the three-variable case with w,, y,, and z,, the test is whether lags of w, Granger cause 
either y, or z,. In essence, the block-exogeneity restricts all lags of w, in the y, and z, to 
be equal to zero. This cross-equation restriction is properly tested using the likelihood 
ratio test given by (5.44). Estimate the y, and z, equations using lagged values of {y,}, 
{z,}, and {w,} and calculate X,. Reestimate excluding the lagged values of {w,} and 
calculate X,.. Next, find the likelihood ratio statistic: 


(T — c)(In |Z, — In |B, 


As in (5.44), this statistic has a y? distribution with degrees of freedom equal to 
2p (since p lagged values of {w,} are excluded from each equation). Here c = 3p + 1 
because the unrestricted y, and z, equations contain p lags of {y,}, {z,}, and {w,} plus 
a constant. 
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Granger Causality and Money Supply Changes 


The usefulness of Granger causality tests can be illustrated by a reconsideration 
of the type of time-series equation used in the St. Louis model. Through the late 
1970s, the conventional wisdom was that fluctuations in money contained useful 
information about the future values of real income and prices. In fact, the argument 
in favor of conducting an active monetary policy is that there exists a systematic 
relationship between current values of the money supply and future values of the price 
level and/or real income. However, there is a large body of literature indicating that 
this relationship broke down in the late 1970s. In an influential article, Friedman and 
Kuttner (1992) argued that the issue is whether fluctuations in money help predict 
future fluctuations in income that are not already predicted on the basis of income 
itself or other readily observable variables. Consider the VAR equation 


4 4 4 
Ay, =a + > B,Am,_; + > YiAg, i + by AY i tE, 
i=l i=l i=1 


Notice how this equation differs from the St. Louis model given by (5.16). Here, 
the logarithmic change in nominal income (Ay,) depends on its own past values and 
on the past values of the logarithmic changes in the nominal money supply (Am,) and 
federal government expenditures (Ag,). 

The issue is simple; in the presence of past values of { Ay, } and { Ag, }, does knowl- 
edge of the money supply series provide any information about the future value of 
nominal income? Toward this end, Friedman and Kuttner (1992) used several mea- 
sures of the money supply (e.g., the money base, M1, M2, and various short-term 
interest rates) and estimated a three-variable VAR over various sample periods. For the 
196002-197902 period, the F-statistic for the null hypothesis that the money base 
does not Granger cause Ay, is 3.68. At the 1% significance level, it is possible to con- 
clude that money Granger causes { Ay,}. However, for the 197003-199004 period, the 
F-statistic is only 0.82; hence, at any conventional significance level, money does not 
Granger cause income. The findings are quite robust to the other measures of the mon- 
etary variable. Until 1979Q2, all of the monetary aggregates Granger cause nominal 
income at the 1% significance level. None of these aggregates Granger causes nominal 
income in the latter period. 

To provide a better understanding of the interrelationships among the three vari- 
ables, Friedman and Kuttner also reported the results of the variance decompositions. 
For the 196002-197902 period, M1 explained 27% of the forecast error variance 
in {Ay,} at both the four- and eight-quarter forecast horizons. In contrast, for the 
197003-199004 period, M1 explained about 10% of the forecast error variance in 
{ Ay, } at both the four- and eight-quarter forecast horizons. These results are in striking 
contrast to those of the St. Louis equation. Undoubtedly, money supply changes have 
become less useful in predicting the future path of nominal income. 


Tests with Nonstationary Variables 


In Chapter 4, we saw that it is possible to perform hypothesis tests on an individual 
equation when some of the regressors are stationary and others are nonstationary. In 
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particular, Rule 1 of Sims, Stock, and Watson (1990) was used to select the appropriate 
lag length in an augmented Dickey—Fuller test. The issue is particularly relevant to 
VARs since many of the regressors are likely to be nonstationary. Recall that a key 
finding of Sims, Stock, and Watson (1990) is: Zf the coefficient of interest can be written 
as a coefficient on a zero-mean stationary variable, then a t-test is appropriate. If the 
sample size is large, you can use the normal approximation for the t-test. To take a 
specific example, consider the following equation from a two-variable VAR: 


Yi = Ay Vp + A1212 + Oy 4Z 1 + b122 + Et (5.45) 


First consider the case in which {y,} is Z(1) and {z,} is (0). Since b,, and b,, 
are coefficients on stationary variables, it is possible to use a t-test to test the hypoth- 
esis b}, = 0 or bj, = 0 and an F-test to test the hypothesis b,,; = bız = 0. Hence, lag 
lengths involving {z,} and the test to determine whether {z,} Granger causes {y,} can 
be performed using the t- or F-distributions. 

Notice that it is possible to use a t-test for the restriction a}; = 0 or a} = 0. You 
can perform both of these tests even though {y,} is not stationary. However, you cannot 
test the restriction a,; = a), = 0 using an F-test. To make the point, add and subtract 
a,y;-1 to the right-hand side of (5.45) to obtain 


Ve = A11Y-1 + 412-1 — A12 Or-1 — Y2) + By 2-1 + bi- + E; 
and if we define a,; + a,, = y, we can write 
Ve = YY1 — 42A + b111 + bi- + E; 


The coefficient a} multiplies the stationary variable Ay,_, so that it is permissible 
to test the null hypothesis a) = 0 using a t-test. Alternatively, add and subtract a,;y,_> 
to the right-hand side of (5.45) to obtain 


Yt = Ay AV — YYp-2 + OY G1 + OynZ-2 + Er 


Thus, the null hypothesis a,, = 0 can similarly be tested using a t-statistic. It is 
important to recognize that the individual coefficients may have normal distributions, 
but the sum a,, + a, = y does not have a normal distribution. It is impossible to isolate 
y as the coefficient on a stationary variable. 

Now suppose that {y,} and {z,} are both /(1). It is easy to show that the coefficients 
dy and b,, can be written as coefficients on stationary variables. Add and subtract both 
412Y;,-1 and b,»z,_, to the right-hand side of (5.45) so that the equation becomes 


Y; = (Gq, + d12)Y-1 — 412-1 — Yp-2) + (b11 + By) Z-1 — b121 — 2-2) + E; 


or 
Ye = Mpa Z A1241 + 2% -1 — b241 +E; (5.46) 


where y, = a); + a). and y = b; + biz. 

Thus, it is possible to perform the lag length test a), =b,,=0 using an 
F-distribution. Equation (5.46) shows that it is possible to rewrite (5.45) in such a 
way that both coefficients multiply stationary variables. As such, an F-test can be 
used to test the joint restriction a, = b;, = 0. However, the restriction that {z,} does 
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not Granger cause {y,} involves the setting y) = bı = 0. Since y, is a coefficient 
on a nonstationary variable, the test is nonstandard—a standard F-statistic is not 
appropriate. Only if you know that y, = 0, you can perform a test to determine whether 
{z,} Granger causes {y,}. Given that y, = 0, (5.46) becomes 


Ve = Mp — 2A -1 + b241 +E; 


Now, it is possible to perform the causality test since only b;y needs to be restricted. 
In the same way, if it is known that y} = 1, we can write? 


Ay, = a ,Ay,_1 + bA +E, 


Now the VAR is entirely in first differences. As such, all coefficients multiply sta- 
tionary variables. These results are quite general and hold for higher-order systems 
containing any number of lags. To summarize, in a VAR with stationary and nonsta- 
tionary variables 


1. You can use t-tests or F-tests on the stationary variables. 


2. You can perform a lag length test on any variable or any set of variables. This 
is true regardless of whether the variable in question is stationary. 

3. You may be able to use an F-test to determine whether a nonstationary vari- 
able Granger causes another nonstationary variable. If the causal variable 
can be made to appear only in first differences, the test is permissible. For 
example, suppose that y,, z,, and x, are all (1) and that it is possible to write 
the equation for {y,} as y; = 71Y,-1 + 4124Y;-1 + G13 AY; + b124Z,-1 + 
biz AZ, 2 + Y3X)-1 + Cy AX%}_1 + €13A%,_2 + €;. It is possible to determine 
whether z, Granger causes {y,} but not whether x, Granger causes {y,}. Simi- 
larly, you cannot test the joint restriction y} = a), = 0. 

4. The issue of differencing is important. If the VAR can be written entirely in 
first differences, hypothesis tests can be performed on any equation or any 
set of equations using t-tests or F-tests. This follows because all of the vari- 
ables are stationary. As you will see in Chapter 6, it is possible to write the 
VAR in first differences if the variables are /(1) and are not cointegrated. 

If the variables in question are cointegrated, the VAR cannot be written in 
first differences; hence, causality tests cannot be performed using t-tests or 
F-tests. 


9. EXAMPLE OF A SIMPLE VAR: DOMESTIC 
AND TRANSNATIONAL TERRORISM 


In the study by Enders, Sandler, and Gaibulloev (2010), we decomposed the Depart- 
ment of Homeland Security’s Global Terrorism Database (GTD) into the transnational 
and domestic terrorism series shown in Figure 5.1. The standard presumption is that 
transnational terrorism responds to events on the international stage, whereas domes- 
tic terrorism responds to country-specific events. In some sense, such thinking implies 
a weak set of interactions between the two types of terrorism since events occurring 
within countries are likely to be idiosyncratic. Nevertheless, if you examine Figure 5.1, 
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it should be clear that the two series do bear a striking relationship to together. To 
explain the likely comovements, we hypothesized that a planned domestic incident 
could inadvertently result in collateral damage to foreigners. It is also likely that ter- 
rorist groups learn from each other so that a successful domestic incident could have a 
demonstration effect on transnational incidents and vice versa. Still another explanation 
is that certain political events, such as the continuing Arab-Israeli conflict, generate 
grievances that give rise to both domestic and transnational terrorist incidents. 

To examine the strength of any potential relationships between the domestic and 
transnational series, we estimated a VAR of the form 


dom, = a) + A,,(L)dom,_; + A,(L)trans,_; + €i; (5.47) 
trans, = dy) + A,,(L)dom,_; + Az7(L)trans,_; + ez (5.48) 


where trans, is the number of transnational terrorist incidents in quarter t; dom, is the 
number of domestic terrorist incidents in quarter t; ag and ay, are intercepts, the A;(L) 
are polynomials in the lag operator L; and e}, and e>, are the serially uncorrelated error 
terms such that E(e,,e5,) is not necessarily zero. 

You can follow along using the data in the file TERRORISM.XLS. However, 
because of some changes in GTD’s coding conventions and some other problems in 
constructing the data set, it is best to begin the estimation at April 1979 (a date that 
corresponds to the takeover of the U.S. embassy in Teheran). 


Empirical Methodology 


Since we are especially concerned about Granger causality, we need to ascertain 
whether the variables are stationary or nonstationary. Toward this end, we performed 
both the Dickey—Fuller (1979) and the Elliott, Rothenberg and Stock (1996) unit root 
tests on the trans, and dom, series excluding the observations prior to April 1979. 
If you use the general-to-specific method to determine the appropriate number of 
augmented lags, you should find a lag length of 2 for dom, and a lag length of 1 for 
trans,. The f-statistics for the two tests are 


DF test ERS test 
dom, —2.69 -2.43 


trans, —2.64 —2.47 


From Table A, the 0.10 and 0.05 critical values for the Dickey—Fuller q, test are 
—2.58 and —2.89, respectively. Hence, if you use the Dickey—Fuller test, for each series, 
it is possible to reject the null hypothesis of a unit root at the 10% level but not at the 5% 
level. Recall that the critical values of the ERS test without a trend term are taken from 
the Dickey—Fuller t test. The critical values at the 1% and 2.5% significance levels are 
—1.61 and —1.95, respectively. Therefore, if you use the more powerful ERS test, it is 
possible to reject the null hypothesis of a unit root at the 2.5% level. As such, it seems 
reasonable to proceed as if the variables are stationary. 
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The polynomials A (ZL) and A>, (ZL) in (5.47) and (5.48) are of particular interest. If 
all of the coefficients of A,(L) are zero, then knowledge of the transnational series does 
not reduce the forecast error variance of domestic incidents. Formally, transnational ter- 
rorism would not Granger cause domestic terrorism. Unless there is a contemporaneous 
response of domestic to transnational terrorism, the domestic series would evolve inde- 
pendently of trans,. In the same way, if all of the coefficients of A}; (L) are zero, then 
dom, does not Granger cause trans,. The absence of a statistically significant contempo- 
raneous correlation of the error terms would then imply that domestic terrorism does not 
affect transnational terrorism. If, instead, any of the coefficients in these polynomials 
differ from zero, there are interactions between the two series. 

The next issue is to determine the lag length to use for the VAR. Since we are 
using a multivariate model, there is no reason to use the lag lengths selected for 
the Dickey—Fuller test. If you begin with a lag length of 4, you should find that 
the general-to-specific method and the multivariate AIC select a lag length of 3. 
Although the multivariate SBC selects a lag length of 2, in order to ensure that all of 
the dynamics are captured by the VAR, proceed using a lag length of 3. Because each 
equation has identical right-hand side variables, ordinary least squares (OLS) is an 
efficient estimation technique. 


Empirical Results 


Once the VAR has been estimated, it is straightforward to determine the causality 
between the variables. Consider the following F-tests (with significance levels in 
parentheses) 


All coefficients of A,,(L) = 0 : 38.49 (0.000) 
All coefficients of A,,(L) = 0 : 1.86 (0.159) 
All coefficients of A,(L) = 0 : 3.36 (0.015) 
All coefficients of A,(L) = 0 : 25.64 (0.000) 


As expected, the F-statistics of A, ;(Z) and A,,(L) are both highly significant, indi- 
cating that each variable is helpful in predicting its own future values. The F-statistic 
for the null hypothesis that transnational terrorism Granger causes domestic terrorism 
is 1.86. Given the prob-value of 0.159, this noncausality seems to be in accordance 
with the conventional wisdom. The important result is that domestic terrorism Granger 
causes transnational terrorism at the 0.015 significance level. Thus, unlike the conven- 
tional wisdom, the two series do not evolve independently of each other. The explana- 
tion for this univariate causality is that conflicts begin locally but, over time, can spill 
over into transnational incidents. 

To ascertain the importance of the interactions between the two series, we obtained 
the variance decompositions. The moving average representations of Equations (5.47) 
and (5.48) express dom, and trans, as dependent on the current and past values of both 
{e,,} and {e,,} sequences: 

o0 
dom, = cg + Elere + €;€y)_j) + err (5.49) 
j=l 
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Table 5.3 Percent of Forecast Error Variance Accounted for by Each Shock 


% Variance of dom, % Variance of trans, 


Shock to dom, Shock to trans, Shock to dom, Shock to trans, 


1-Step ahead 100.0 0.0 31 96.9 
4-Steps ahead 97.9 2.1 11.6 88.4 
8-Steps ahead 98.1 1.9 24.9 75:1 
12-Steps ahead 97.6 2.4 33.6 66.4 
Cc 
trans, = dy +)" (dyer. + dojezj) + e% (5.50) 
j=l 


where co, do, C1;, C2j dij, and dh; are all parameters. 

Because we cannot estimate (5.49) and (5.50) directly, we used the residuals of 
(5.47) and (5.48) and then decomposed the variances of dom, and trans, into the per- 
centages attributable to each type of innovation. We used the orthogonalized innova- 
tions obtained from a Choleski decomposition; the order of the variables in the fac- 
torization had no qualitative effects on our results (the contemporaneous correlation 
between e4, and e}; is 0.18). We report results such that the shock to dom, has no con- 
temporaneous effect on trans,. 

The variance decompositions for 1-, 4-, 8-, and 12-month forecasting horizons are 
reported in Table 5.3. As expected, each series explains the preponderance of its own 
past values at short forecasting horizons. For example, at a four-step-ahead forecasting 
horizon, domestic terrorism explains 97.9% of its forecast error variance, while trans, 
explains 88.4% of its forecast error variance. As the forecasting horizon expands, the 
effect of trans, shocks on the variance of dom, remains small. However, after 12 quar- 
ters, dom, explains 33.6% of the forecast error variance of transnational terrorism. Not 
only is causality is unidirectional (i.e., domestic terrorism Granger causes transnational 
terrorism) but also the effect of domestic terrorism on transnational terrorism is sub- 
stantial. 

Figure 5.8 shows the impulse responses of each series to domestic and transna- 
tional terrorism shocks. The solid lines represent the impulse responses, and the dashed 
lines represent a 95% confidence interval. The response to domestic terrorism to its own 
one standard deviation shock (equal to 48.15 incidents per quarter) is shown in the 
upper left-hand panel. Notice that the response is quite persistent in that it remains sta- 
tistically significant for 10 quarters. The upper right-hand panel shows that the response 
of domestic terrorism to a transnational shock is never significant. The effect of the 
domestic shock on transnational terrorism is shown in the lower left-hand panel. At 
onset, the level of transnational terrorism jumps by about 2 incidents per quarter and 
remains significant for 18 quarters. The cumulated sum of all transnational incidents 
to the domestic terrorism shock (i.e., the sum of the impulse responses) is 34.29 inci- 
dents. The response of transnational terrorism to its own one standard deviation shock 
is shown in the lower-right panel of Figure 5.8. As compared to domestic terrorism, 
the responses are not very persistent. After an initial jump, there is a sharp decline in 
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the impulse responses such that they are insignificantly different from zero after six 
quarters. 

Overall, the conventional wisdom that the two series evolve independently of each 
other is not supported by the data. Given the unidirectional causality from domestic to 
transnational terrorism, it appears that conflicts involving terrorism begin locally and, 
over time, spread to becoming transnational. 


10. STRUCTURAL VARs 


Sims’s (1980) VAR approach has the desirable property that all variables are treated 
symmetrically so that all variables are jointly endogeneous and the econometrician 
does not rely on any “incredible identification restrictions.” Consider a first-order VAR 
system of the type represented by (5.19): 


xX, = Ag Aix +e 


Although the VAR approach yields only estimated values of Ap and A,, for expo- 
sition purposes, it is useful to treat each as being known. As we saw in (5.42), the 
n-step-ahead forecast error is 


2 —1 
Xin T E Xin = Citn + AjCpyn1 + Ajer4n-2 reer Al CH (5.51) 


Even though econometric analysis will never reveal the actual values of Ag and 
A,, an appropriately specified model will have forecasts that are unbiased and have 
minimum variance. A researcher interested only in forecasting might want to trim down 
the overparameterized VAR model in order to improve the precision of the estimates 
and reduce the forecast-error variance. Nonetheless, it should be clear that forecasting 
with a VAR is a multivariate extension of forecasting using a simple autoregression. 

However, given the somewhat ad hoc nature of the Choleski decomposition, the 
beauty of the approach seems diminished when constructing impulse response func- 
tions and forecast error variance decompositions. Moreover, the VAR approach has 
been criticized as being devoid of any economic content. The sole role of the economist 
is to suggest the appropriate variables to include in the VAR. From that point on, the 
procedure is almost mechanical. However, it is possible to use an economic theory to 
impose restrictions on the variables so that the results are not ad hoc. 


Structural Decompositions 


Unless the underlying structural model can be identified from the reduced-form VAR 
model, the innovations in a Choleski decomposition do not have a direct economic 
interpretation. However, instead of using a Choleski decomposition, it is possible to 
impose restrictions on the errors so as fully identify the structural shocks in a way that 
is consistent with an underlying economic model. Reconsider the two-variable VAR of 
(5.17) and (5.18): 


Vp + bizz = bio + NM H1 + V121 + Eye 
Day, + 2 = bao + Y21Yi-1 + Y22%-1 + Ezt 
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so that it is possible to write the model in the form of (5.20) and (5.21): 


Yi = lio FA V1 F a11 + et 
Zi = Q0 F Ag Vp-1 F A2221 F Ezr 


where the various a; are defined as in (5.19). For our purposes, the important point to 
note is that the two error terms e4, and e,, are actually composites of the underlying 
shocks Eyt and £. From (5.22) and (5.23), 


lm L, Y 
eo l- bpb [7b 1 Ezt 


Although these composite shocks are the one-step-ahead forecast errors in y, and 
Zp» they do not have a structural interpretation. Hence, there is an important difference 
between using VARs for forecasting and economic analysis. In (5.51), e}; and e,, are 
forecast errors. If we are interested only in forecasting, the components of the forecast 
errors are unimportant. Given the economic model of (5.17) and (5.18), €,, and €; 
are the autonomous changes in y, and z, in period f, respectively. If we want to obtain 
the impulse response functions or the variance decompositions, it is necessary to use 
the structural shocks (i.e., €,, and €,,), not the forecast errors. The aim of a structural 
VAR is to use economic theory (rather than the Choleski decomposition) to recover the 
structural innovations from the residuals {e,,} and {ez}. 

The Choleski decomposition actually makes a strong assumption about the under- 
lying structural errors. Suppose, as in (5.31)—(5.33), we select an ordering such that 
b; = 0. Recall that with a recursive ordering such that z, is causally prior to y,, the two 
pure innovations can be recovered as 


Eyt = Cy, + Dine 


and 
Ex = En 


Forcing b); = 0 is equivalent to assuming that an innovation in y, does not have a 
contemporaneous effect on z,. Unless there is a theoretical foundation for this assump- 
tion, the underlying shocks are improperly identified. As such, the impulse responses 
and variance decompositions resulting from this improper identification can be quite 
misleading. 

If the correlation coefficient between e,, and e,, is low, the ordering is not likely 
to be important. However, in a VAR with several variables, it is improbable that 
all correlations will be small. After all, in selecting the variables to include in a 
model, you are likely to choose variables that exhibit strong comovements. When the 
residuals of a VAR are correlated, it is not practical to try all alternative orderings. 
With a four-variable model, there are 24 (i.e., 4!) possible orderings. Sims (1986) 
and Bernanke (1986) proposed modeling the innovations using economic analysis. 
The basic idea is to estimate the relationships among the structural shocks using an 
economic model. To understand the procedure, it is useful to examine the relationship 
between the forecast errors and the structural innovations in an n-variable VAR. 
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Since this relationship is invariant to lag length, consider the first-order model with n 
variables: 


1 Dy big +++ Din |) Xu Dio Yii Y12 %13 `t? Yin || Xir- Eir 
ba 1 ba +++ Don f| Xz = by 4 |721 %22 Yo3 ``: Yon |) 72-1 | 4 | Ex 


bni bn bn3 _ Din Xnt bno Yni Yn2 Yn3 `? Vanni Ent 


or, in compact form: 
Bx, = To + liti + Er 


The multivariate generalization of (5.19) is obtained by premultiplying by B~! so 
that 


x= BT) + BT xy $B e; 


Defining Ay = B-'Ty, A, = BIT}, and e, = Ble, yields (5.19). The problem, 
then, is to take the observed values of e, and to restrict the system so as to recover 
£, as €, = Be,. However, the selection of the various b;; cannot be completely arbitrary. 
The issue is to restrict the system so as to (i) recover the various {¢€;,} and (ii) preserve 
the assumed error structure concerning the independence of the various {¢€;,} shocks. To 
solve this identification problem, simply count equations and unknowns. Using OLS, 
we can obtain the variance/covariance matrix ÈX: 


2 
Oy O12 `’ Oln 
2 
ya=]O2 %2 Oan 
"y 
Ont Om On 


where each element of È is constricted as the sum 
T 
oy = (1/7) Dene 
t=1 


Since = is symmetric, it contains only (n? + n)/2 distinct elements. There are n ele- 
ments along the principal diagonal, (n — 1) along the first off-diagonal, (n — 2) along the 
next off-diagonal, ..., and one corner element for a total of (n? +n) /2 free elements. 

Given that the diagonal elements of B are all unity, B contains n? — n unknown val- 
ues. In addition, there are the n unknown values var(e;,) for a total of n? unknown values 
in the structural model [i.e., the n? — n values of B plus the n values var(e;,)]. Now the 
answer to the identification problem is simple; in order to identify the n? unknowns 
from the known (n? + n)/2 independent elements of È, it is necessary to impose an 
additional n? — [(n? + n)/2] = (n? + n)/2 restrictions on the system. This result gen- 
eralizes to a model with p lags: To identify the structural model from an estimated VAR, 
it is necessary to impose (n? — n)/2 restrictions on the structural model. 

Take a moment to count the number of restrictions in a Choleski decomposition. In 
the system above, the Choleski decomposition requires all elements above the principal 
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diagonal to be zero: 


Diy = biz = Dig = = bin 0 
bz3 = Dog = = Don, =0 
b34 = b3, = 0 

Dt =0 


Hence, there are a total of (n? — n)/2 restrictions; the system is exactly identi- 
fied. To take a specific example, consider the following Choleski decomposition in a 
three-variable VAR 


Cn = Err 
ent = C21 Eq, F Er, 


C3, = C31 E1, + C32Eo, + E3; 


From the previous discussion, you should be able to demonstrate that €),, €5,, and 
€3, can be identified from the estimates of e),, €23, and e3, and variance/covariance 
matrix È. In terms of our previous notation, define matrix C = B7! with elements Cij- 
Hence, e, = Ce,. An alternative way to model the relationship between the forecast 


errors and the structural innovations is 


elt = Eq, + C13 E3, 
Co = C21 E1 + Ex 


3, = C31 Er, + E3r 


Notice the absence of a triangular structure. Here, the forecast error of each vari- 
able is affected by its own structural innovation and the structural innovation in one 
other variable. Given the (9 — 3)/2 = 3 restrictions on C, the necessary condition for 
the exact identification of B and g, is satisfied. However, as illustrated in the next 
section, imposing (n? — n)/2 restrictions is not a sufficient condition for exact iden- 
tification. Unfortunately, the presence of nonlinearities means that there are no simple 
rules that guarantee exact identification. 

For those wanting a bit more formality, write the variance/covariance matrix of the 


regression residuals as 
2 
o Oo 
oye 12 
Ee’ =X={ 1 3 
On, © 


Given that e, = Bo'e,, it must be the case that 


bee =n ee (han eee (ea) (5.52) 


Note that E(e,e7) is the variance/covariance matrix of the structural innovations 
(È). Since the covariance between the structural shocks is zero, we can write X, as 


ye ies 0 


(0) var(£,) 
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To find the relationship between the structural innovations and the regression resid- 
uals, substitute X and Ł, into (5.52) to obtain 


o? O12 -p var(£;) (0) (Boy! 
Or, 05 0 var(€) 


or 2 =j =17 T 
o On\ (1 bpn var(£;) 0 1 Dy 
Since the two sides must be identical element by element, it must be the case that 
1/0 = bbz) I 
1/0 = bba) I- 
1/01 — bi2b21)  [-b zvar(ez) — bz var(e;)] 
1/(1 = bi2b21)] [var(e3) + b var(e;)] 


var(£;) + b? „Var(e2)] 


2 


o 
pe b,xvar(€,) — b var(e;)] 
o2 


= 
[ 
= 
=[ 


Since the four values of X are known, it would appear that there are four equations 
to determine the four unknown values bj, b21, var(€é,), and var(€,). However, the sym- 
metry of the system is such that oy; = oj) so that there are only three independent 
equations to determine the four unknown values. As such, identification is not possible 
unless another restriction is imposed. 

To generalize the argument to an nth-order VAR system, we have 


P=B EKA 
where È, B~', and £, are n X n matrices. Using the same logic, it is possible to show 


that it is necessary to impose (n? — n)/2 additional restrictions on B7! to completely 
identify the system. Some specific examples are considered in Section 11. 


11. EXAMPLES OF STRUCTURAL 
DECOMPOSITIONS 


To illustrate the Sims—Bernanke decomposition, suppose that there are five residuals 
for e,, and e. Although a usable sample size of five is unacceptable for estimation 
purposes, it does allow us to do the necessary calculations in a simple manner. Thus, 
suppose that the five error terms are 


t 1 2 3 4 5 


ey 1.0 -0.5 0.0 -1.0 0.5 
ez 0.5 -1.0 0.0 -0.5 1.0 


Since {e,,} and {e5,} are regression residuals, their sums are zero. It is simple 
to verify that o? = 0.5, 6). = 021 = 0.4, and o? = 0.5; hence, the variance/covariance 


matrix È is 
ve 0.5 0.4 
~ 10.4 0.5 
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Although the covariance between £4, and £}, is zero, the variances of €,, and €,, are 
presumably unknown. As in the previous section, let the variance/covariance matrix of 
these structural shocks be denoted by £, so that 


ee ane 0 


0 var(€>) 


The reason that the covariance terms are equal to zero is that €,, and €5, are deemed 
to be pure structural shocks. Moreover, the variance of each shock is time invariant. 
For notational convenience, the time subscript can be dropped; for example, var(€),) = 
var(€},_;) = ... = var(€,). The relationship between the variance/covariance matrix of 
the forecast errors (i.e., &) and the variance/covariance matrix of the pure shocks (i.e., 
X.) is such that £, = BIB’. Recall that e, and e, are the column vectors (e},, €2;)" and 


(Eip E2)", respectively. Hence, 
2 
' Ei Eux 
ee, = 2 
Crear = Oy 


so that ‘ 
r= Ly ee (5.53) 
T t=1 
Similarly, X, is 
1 
B= aD ett (5.54) 


t=1 


To link the two variance/covariance matrices, note that the relationship between £, 
and e, is such that £, = Be,. Substitute this relationship into (5.54) and recall that the 
transpose of a product is the product of the transposes [i.e., (Be,)’ = eB |, so that 


T 
Z= = Be,e!B' 

t=1 

Thus, using (5.53), we get 
£, = BEB 
Using the specific numbers in the example, it follows that 
var(£;) 0 _ 1 biz 0.5 0.4 1 bo, 
O — var(e,)} [ba 14404 0.5] |b, 1 
Since both sides of this equation are equivalent, they must be the same element by 
element. Carry out the indicated multiplication of BEB’ to obtain 


var(€) = 0.5 + 0.80, + 0.507, (5.55) 
0=0.5b,, + 0.4b,,b,) + 0.4 + 0.5b)5 (5.56) 
0=0.5b>, + 0.4bj by, + 0.4 + 0.5b,5 (5.57) 


var(€) = 0.5b2, + 0.8by, + 0.5 (5.58) 
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As you can see, equations (5.56) and (5.57) are identical. There are three inde- 
pendent equations to solve for the four unknowns b42, b21, var(e,), and var(e,). As we 
saw in the last section, in a two-variable system, one restriction needs to be imposed if 
the structural model is to be identified. Now consider the Choleski decomposition one 
more time. If b,, = 0, we find 


var(e;) = 0.5 
0 = 0.5b,, + 0.4 so that by, = —0.8 
0 = 0.5b,, + 0.4 so that again we find b,, = —0.8 


var(€,) = 0.5(b>,)* + 0.855; + 0.5 so that var(e,) = 0.5(0.64) — 0.64 + 0.5 = 0.18 
Using this decomposition, we can recover each {€,,} and {€,} as €, = Be;,: 
Er = ir 


and 
Eo = —0.8e;; + eo 


Thus, the identified structural shocks are 


t 1 2 3 4 5 


by 1.0 -0.5 0.0 -1.0 0.5 
Ez, -0.3 -0.6 0.0 0.3 0.6 


If you want to take the time, you can verify that var(e,) = X(e,,)7/5 = 0.5, 
var(E>,) = X(€>,)"/5 = 0.18, and cov(€,,€,) = LE1,E2,/5 = 0. Instead, if we impose 
the alternative restriction of a Choleski decomposition and set b); = 0, from (5.55) to 
(5.58), we obtain 


var(é,) = 0.5 + 0.8b,) + 0.5b,5” 


0 = 0.4+0.5b,5 so again b; = —0.8 
var(e,) = 0.5 
Since b; = —0.8, var(e,) = 0.5 + 0.8(—0.8) + 0.5(0.64) = 0.18. Now, B is iden- 
tified as 
1 -0.8 
af) "| 


If we use the identified values of B, the structural innovations are such that €), = 
e, — 0.8e,, and €5, = ep. Hence, we have the structural innovations 


t 1 2 3 4 5 


ëi 0.6 0.3 0.0 -0.6 -0.3 
Ez 0.5 -1.0 0.0 -0.5 1.0 


In this example, the ordering used in the Choleski decomposition is very important. 
This should not be too surprising since the correlation coefficient between e4, and ez; 
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is 0.8. The point is that the ordering will have important implications for the resulting 
variance decompositions and impulse response functions. Selecting the first ordering 
(i.e., setting b}, = 0) gives more importance to innovations in €,,. The assumed timing 
is such that €,, can have a contemporaneous effect on x,, and x», while £), shocks can 
affect xı, only with a one-period lag. Moreover, the amplitude of the impulse responses 
attributable to £}, shocks will be increased since the ordering affects the magnitude of 
a “typical” (i.e., one standard deviation) shock in €), and decreases the magnitude of a 
“typical” £, shock. 

The important point to note is that the Choleski decomposition is only one type of 
identification restriction. With three independent equations among the four unknowns 
bj>, b21, var(e,,), and var(€,,), any other linearly independent restriction will allow for 
the identification of the structural model. Consider some of the other alternatives: 


1. A coefficient restriction. Coefficient restrictions are necessarily short-run 
restrictions on the dynamics of the model. The most common restriction 
is a zero restriction such that one variable has no contemporaneous effect 
on another. However, unlike a Choleski decomposition, there is no need 
to rely on a triangular formulation. Another common type of coefficient 
restriction involves setting a coefficient to unity. Suppose that we know that a 
one-unit innovation £, has a one unit effect on x,,; hence, suppose we know 
that b,;, = 1. By using the other three independent equations, it follows that 
var(é;,) = 1.8, b2; = —1, and var(e,,) = 0.2. 

Given that £, = Be,, we obtain 


d= La a] 


so that €), = e), + €z, and €5, = —e€,, + ez, If we use the five hypothetical 
regression residuals, the decomposed innovations become 


t 1 2 3 4 5 


Sty 1.5 -1.5 0.0 -1.5 1.5 
Ez -0.5 -0.5 0.0 0.5 0.5 


2. A variance restriction. One natural restriction is that var(e;,) = 1. To illus- 
trate the decomposition with some other variance restriction, suppose that 
we know var(£;;) = 1.8. Given the relationship between X, and È (i.e., 2, = 
BB"), a restriction on the variances contained within ©, will always imply 
multiple solutions for the coefficients of B. The first equation yields two 
possible solutions for b}, = 1 and b, = —2.6; unless we have a theoreti- 
cal reason to discard one of these magnitudes, there are two solutions to the 
model. If b) = 1, the remaining solutions are b), = —1, and var(€5,) = 0.2. If 
bia = —2.6, the solutions are b,, = —5/3 and var(e,,) = 5/9. 

The two solutions can be used to identify two different {¢€,,} and {€,,} 
sequences, and innovation accounting can be performed using both solutions. 
Even though there are two solutions, both satisfy the theoretical restriction 
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concerning var(€,,). Rigonon and Sack (2004) illustrate how a volatility break 
can be used to identify a structural VAR. 

Symmetry restrictions. A linear combination of the coefficients and variances 
can be used for identification purposes. Symmetry restrictions are popular 

in open-economy models in that they allow a shock to have equal effects 
across countries. As detailed in the Supplementary Manual, Enders and Souki 
(2008) use symmetry restrictions to identify three country-specific shocks 
and a global shock in a four-variable VAR. Consider the symmetry restric- 
tion bı = b»,. If we use equation (5.56), there are two solutions: bj, = by, = 
—0.5 or bj, = by; = —2.0. For the first solution, we find var(e,,) = 0.225 and, 
for the second solution, var(€,,) = 0.9. From the first solution, 


Euf | 1 —O5) fei 
Ez —0.5 1 ez; 


so that 

t 1 2 3 4 5 
Ei; 0.75 0.0 0.0 -0.75 0.0 
Ss, 0.0 -0.75 0.0 0.0 0.75 


Sign restrictions. A new area of research concerns sign restrictions. For 
example, suppose it is known that an oil price shock does not affect GDP 

for the first two quarters after the shock. Similarly, suppose it is known that a 
monetary shock has a positive effect on inflation. Mountford and Uhlig (2008) 
show how such sign restrictions can be used in identification. 


12. OVERIDENTIFIED SYSTEMS 


It may be that economic theory suggests more than (n? — n)/2 restrictions. If so, it is 
necessary to modify the method earlier. The procedure for identifying an overidentified 
system entails the following steps: 


STEP 1: 


STEP 2: 


STEP 3: 


The restrictions on B or var(e;,) do not affect the estimation of VAR coef- 
ficients. Hence, estimate the unrestricted VAR x, = Ay + A,X; +: + 
ApXr-p + €,. Use the standard lag length and block-causality tests to help 
determine the form of the VAR. 

Obtain the unrestricted variance/covariance matrix X=. The determinant of 
this matrix is an indicator of the overall fit of the model. 

Restricting B and/or 2, will affect the estimate of X. Select the appropriate 
restrictions and maximize the likelihood function with respect to the free 
parameters of B and X,. This will lead to an estimate of the restricted vari- 
ance/covariance matrix. Denote this second estimate by Xp. The difference 
[Zl — |È] has a y? distribution with degrees of freedom equal to the number 
of overidentifying restrictions. 
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For those wanting a more technical explanation, note that the log likeli- 
hood function can be written as 


T 
T 1 = 
=z më- PAG le,) 


Fix each element of e, (and e!) at the level obtained using OLS; call 
these estimated OLS residuals @,. Now use the relationship 2, = BEB’ so 
that the log likelihood function can be written as 


T 
-In |ja7z, (’) "| = Z D @B'E; Be) 
t=1 


Now select the restrictions on B and £, and maximize with respect to 
the remaining free elements of these two matrices. The resulting estimates of 
B and &, imply a value of È that we have dubbed Xp. A number of popular 
software packages can perform this type of estimation using the Generalized 
Method of Moments. 


STEP 4: If the restrictions are not binding, £ and Xp, will be equivalent. Let R = 


the number of overidentifying restrictions; that is, R = the number of restric- 
tions exceeding (n? — n)/2. Then, the y? test statistic 


x = [Zl - IZI 


with R degrees of freedom can be used to test the restricted system. If the cal- 
culated value of y? exceeds that in a y? table, the restrictions can be rejected. 
Now, allow for two sets of overidentifying restrictions such that the number 
of restrictions in R, exceeds that in R}. In fact, if R, > R4 > (n? — n)/2, the 
significance of the extra R, — R, restrictions can be tested as 


X = [Ep] — |Zgi| with R, — R, degrees of freedom 


Similarly, in an overidentified system, the t-statistic for the individual coefficients 


can be obtained. Sims warned that the calculated standard errors may not be very accu- 
rate. Also note that Waggoner and Zha (1997) point out that the normalization can have 
important effects on statistical inference. 


Two Examples 


Despite the so-called Great Recession, in December, 2011, the World Bank’s food price 
index was almost three times higher than it was in 2000. In Enders and Holt (2013), 
we tried to find the factors responsible for this general run-up in food prices with 
particular emphasis on the price of grain. Toward that end, we used a simple VAR con- 
taining measures of real energy prices, exchange rates, interest rates, and grain prices. 


Consider: 
pe, aio Ay, (L) Anll) Agl) AlL) || pen eir 
Da men Aa (L) An(L) Az(L) Ang(L) || ex- 4162 
r, a30 A3)(L) A3(L) A33(L) A34(L) || 4-1 e3, 


P8: a49 Ay (L) Ag(L) Ag(L) A44(L) || P8- Car 
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where pe, = log of the World Bank’s energy price index, ex, = the real trade weighted 
price of the U.S. dollar, r, = 3-month T-bill rate adjusted for inflation, and pg, = log of 
the World Bank’s composite price index of grain. The prices of grain and energy were 
deflated by the producer price index, the a;g are intercepts, the A;(L) are polynomials 
in the lag operator L, and the e, are the regression residuals. The estimation period 
runs from January 1974 to December 2011. You can follow along using the data set 
ENDERS_HOLT.XLS. 

The first step in estimating the VAR is to determine the appropriate lag length. The 
choice is difficult because different methods of lag length selection yield very different 
optimal lag lengths. For example, the SBC selects two lags and the general-to-specific 
method selects 11 lags. We did not pursue the two-lag model since we were concerned 
that the short lag length would omit some important dynamics. In particular, it did not 
seem reasonable to believe that grain prices fully responded to energy, exchange rate, 
and interest rate changes in 2 months. Here, we can work with 7 lags since the results 
are very similar to those with 11 lags. 

Exact identification in a four-variable VAR requires six restrictions. However, we 
wanted to determine if the three macroeconomic variables, as a block, were causally 
prior to each other. Specifically, we wanted to test whether the following overidentified 
system with nine restrictions was consistent with the data: 


eir g&u 0 0 0 én, 
Eh | — 0 gn 0 O lex 


C31 0 O g3 O ||Ez 
Cat 841 842 843 844} Lar 


Imposing these nine restrictions resulted in a y? value of 13.53; with three degrees 
of freedom (since there are three overidentifying restrictions), the prob-value was 
0.003. As such, we could not treat shocks to the energy price, exchange rate, and 
interest rate equations as pure structural shocks. Since the correlation between e,, and 
€3, was high, it seemed likely that the rejection was due to the fact that we forced g,, 
and/or g3, to be zero. We tried the following identification using eight restrictions: 


elt &ı 0 0 0 ei 


eaj _ 0 8n 823 0 |) Ea, 
31 0 O 83 O Ilex 
C4t 841 $842 843 844 }L&4r 


Imposing these eight restrictions yielded a value of y? equal to 4.57; with two 
degrees of freedom (since there are two overidentifying restrictions), the restriction is 
not binding (the significance level is 0.102). As such, real grain prices are contempo- 
raneously affected by all variables, and the real exchange rate is contemporaneously 
affected by real interest rate shocks. The innovations in real energy prices and interest 
rates are due to their own pure shocks. 

The beauty of the identification scheme is that it seems quite plausible, and we do 
not have to worry about imposing a set of ad hoc restrictions as in a Choleski decompo- 
sition. The impulse responses of grain prices to one standard deviation positive shocks 
to real energy prices, interest rates, and exchange rates are shown in Figure 5.9. Note 
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FIGURE 5.9 Responses of Grain to the Three Shocks 


that the responses have been standardized by dividing each by the standard deviation 
of the grain price shock (i.e., dividing by var(€4,)°°). On onset, the energy price shock 
(the solid line in the figure) does not affect grain prices. However, grain prices begin 
to rise rapidly; by 6 months, the shock causes grain prices to rise by nearly 40%. Also 
note that an appreciation of the dollar, making U.S. grain more expensive to foreigners, 
causes the domestic price of grain to decline. There are at least two reasons that interest 
rate increases could cause grain prices to fall. Interest rate shocks curtail demand and 
increase the cost of holding grain in storage. As demand falls and grain inventories are 
reduced, it is expected that the price of grain will decline. It turns out that the recent 
period has been characterized by high energy prices and low interest rates. Moreover, 
up until the end of the financial crisis, the dollar was quite weak. All three factors help 
to explain the recent increase in real grain prices. 


SIM’S MODEL To take another example, Sims (1986) used a six-variable VAR of 
quarterly data over the period 194801-197903. The variables included in the study 
are real GNP (y), real business fixed investment (i), the GNP deflator (p), the money 
supply as measured by M1 (m), unemployment (u), and the treasury bill rate (r). An 
unrestricted VAR was estimated with four lags of each variable and a constant term. 
Sims obtained the 36 impulse response functions using a Choleski decomposition with 
the ordering y > i > p > m—>u—r. Some of the impulse response functions had 
reasonable interpretations. However, the response of real variables to a money supply 
shock seemed unreasonable. The impulse responses suggested that a money supply 
shock had little effect on the prices, output, or interest rate. Given a standard money 
demand function, it is hard to explain why the public would be willing to hold the 
expanded money supply. Sims proposed an alternative to the Choleski decomposition 
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that is consistent with money market equilibrium. Sims restricts the B matrix such that 


1 by O 0 0 Offr, Ent 
by, 1 ba byg O O j|m, Emt 
by, 0 1 O 0 b3f|Y | _ | E 
by 0 baz 1 O bye }] P: Ept 
bs, 0O bs; bsg 1 D56}) uy Eut 

0 0 0 0 0 1 iy Ei 


Notice that there are 17 zero restrictions on the b;;. The system is overidenti- 
fied; with six variables, exact identification requires only (6? — 6)/2 = 15 restrictions. 
Imposing these restrictions, Sims identifies the following six relationships among the 
contemporaneous innovations: 


r, = 71.20m, + €, (5.59) 
m, = 0.283y, + 0.224p, — 0.00817, + £, (5.60) 
y, = —0.00135r, + 0.132i, + E€, (5.61) 
p, = —0.0010r, + 0.045y, — 0.00364i, + €,, (5.62) 
u, = —0.116r, — 20.1y, — 1.48i, — 8.98p, + €,, (5.63) 

i, = E (5.64) 


Sims views (5.59) and (5.60) as money supply and demand functions, respectively. 
In (5.59), the money supply rises as the interest rate increases. The demand for money 
in (5.60) is positively related to income and the price level and negatively related to 
the interest rate. Investment innovations in (5.64) are completely autonomous. Other- 
wise, Sims sees no reason to restrict the other equations in any particular manner. For 
simplicity, he chooses a Choleski-type block structure for GNP, the price level, and the 
unemployment rate. The impulse response functions appear to be consistent with the 
notion that money supply shocks affect the prices, income, and interest rates. 


13. THE BLANCHARD-QUAH DECOMPOSITION 


Blanchard and Quah (1989) provide an alternative way to obtain a structural VAR. Their 
aim is to reconsider the Beveridge and Nelson (1981) decomposition of real GNP into 
its temporary and permanent components. Toward this end, they developed a macroe- 
conomic model such that real GNP is affected by demand-side and supply-side distur- 
bances. In accordance with the natural rate hypothesis, demand-side disturbances have 
no long-run effect on real GNP. On the supply side, productivity shocks are assumed to 
have permanent effects on output. Using a bivariate VAR, Blanchard and Quah show 
how to decompose real GNP and recover the two pure shocks. 

To take a general example, suppose we are interested in decomposing an /(1) 
sequence, say {y,}, into its temporary and permanent components. Let there be a sec- 
ond variable {z,} that is affected by the same two shocks. For the time being, suppose 
that {z,} is stationary. If we ignore the intercept terms, the bivariate moving average 
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(BMA) representation of the {y,} and {z,} sequences will have the form 


foe) o0 


Ay, = È, nEn + DY eaen (5.65) 
k=0 k=0 

a= > Co (KE 1p ~ + b? Caa(k)Ez1-k (5.66) 
k=0 k=0 


or, in a more compact form, 


E = B (L) | en 
2 Call) Cy(L)] [Ex 


where €,, and €,, are independent white-noise disturbances, each having a constant 
variance, and the C;(L) are polynomials in the lag operator L such that the individual 
coefficients of C y) are denoted by c;(k). For example, the third coefficient of C2 (L) 
is c2; (3). For convenience, the time subscripts on the variances and the covariance terms 
are dropped, and the shocks are normalized so that var(é,) = 1 and var(e,) = 1. If we 
call X, the variance/covariance matrix of the innovations, it follows that 


5 -| var(é}) | 


COV(E], E2) var(E>) 


-f il 


In order to use the Blanchard and Quah (BQ) technique, at least one of the vari- 
ables must be nonstationary since /(0) variables do not have a permanent component. 
However, to use the method, both variables must be in a stationary form. Since {y,} is 
I(1), (5.65) uses the first difference of the series. Note that (5.66) implies that the {z,} 
sequence is /(Q); if, in your own work, you find {z,} is also /(1), use its first difference. 

In contrast to the Sims—Bernanke procedure, Blanchard and Quah do not directly 
associate the {€,,} and {€,} shocks with the {y,} and {z,} sequences. Instead, the {y,} 
and {z,} sequences are the endogenous variables, and the {€,,} and {€,} sequences rep- 
resent what an economic theorist would call the exogeneous variables. In their example, 
y; is the logarithm of real GNP, z, is unemployment, £}, is an aggregate demand shock, 
and £, is an aggregate supply shock. The coefficients of C4; (L), for example, represent 
the impulse responses of an aggregate demand shock on the time path of change in the 
log of real GNP. 

The key to decomposing the {y,} sequence into its permanent and stationary com- 
ponents is to assume that one of the shocks has a temporary effect on the {y,} sequence. 
It is this dichotomy between temporary and permanent effects that allows for the com- 
plete identification of the structural innovations from an estimated VAR. For example, 
Blanchard and Quah assume that an aggregate demand shock has no long-run effect on 
real GNP. In the long run, if real GNP is to be unaffected by the demand shock, it must 
be the case that the cumulated effect of an €,, shock on the Ay, sequence must be equal 
to zero. Hence, the coefficients c4;(k) in (5.65) must be such that 


Dy Cy WE 1-4 = 9 


k=0 
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Since this must hold for any possible realization of the {€,,} sequence, it must be 
the case that 


o0 


$ cu& =0 (5.67) 


k=0 


Since the demand-side and supply-side shocks are not observed, the problem is to 
recover them from a VAR estimation. Given that the variables are stationary, we know 
that there exists a VAR representation of the form: 


Ay, Ay, (ZL) Ajo(L)] | Ay,-1 eir 
= + 5.68 
| Zi bee Ago(L)} | 2-1 Cor ( ) 
or, to use a more compact notation, 


x, = A(L)x;_1 +E, 


where x, is the column vector (Ay,, z,)’, e, is the column vector (e),, €5,), A(L) is the 
2 x 2 matrix with elements equal to the polynomials A,(L), and the coefficients of A,(L) 
are denoted by a;;(k). 

The critical insight is that the VAR residuals are composites of the pure innova- 
tions €,, and €,. For example, e,, is the one-step-ahead forecast error of y,; that is, 
ei; = Ay, — E,_, Ay,. From the BMA, the one-step-ahead forecast error is c,;(O)€, + 
C12(0)é,. Since the two representations are equivalent, it must be the case that 


eir = C1 Der, + c12(0)E2; (5.69) 
Similarly, since e,, is the one-step-ahead forecast error of z, 
ez, = Cr (DE 1, + C22(0)E2, (5.70) 


or, combining (5.69) and (5.70), we get 


|] _ (0) se a 
ex ca) c22(0)| [Ear 

If c11 (0), c12(0), c21(0), and c33(0) were known, it would be possible to recover 
€,, and £,, from the regression residuals e}, and e,,. Blanchard and Quah show that the 
relationship between (5.68) and the BMA model plus the long-run restriction of (5.67) 
provide exactly four restrictions that can be used to identify these four coefficients. The 


VAR residuals can be used to construct estimates of var(e,), var(e,), and cov(e;, e2). 
Hence, there are the following four restrictions: 


RESTRICTION 1 


Given (5.69) and noting that Fe, ,€, = 0, the normalization var(e,) = var(€,) = 1 
means that the variance of e}, is 


var(e,) = c1, (0)? + c10)? (5.71) 
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RESTRICTION 2 


Similarly, using (5.70), the variance of e,, is related to c,,(0) and cy(0) as 


var(ey) = c2; (0)? + ¢99(0)* (5.72) 


RESTRICTION 3 


The product of e,, and e, is 
eie, = [C1 (Ey, + Cp2(DeEr, Leo (Dey, + Coxe, | 
If we take the expectation, the covariance of the VAR residuals is 
Eere, = c11 (0)c21 (0) + c12(0)c22(0) (5.73) 


Thus, (5.71-5.73) can be viewed as three equations in the four unknowns 
c11(0), c12(0), c21(0), and c33(0). The fourth restriction is embedded in the 
assumption that the {£;,} has no long-run effect on the {y,} sequence. The 
problem is to transform the restriction (5.67) into its VAR representation. Since 
the algebra is a bit messy, it is helpful to rewrite (5.68) as 


x, = A(L)Lx, + e, 
so that 
[1 —A(L)L]x, =e, 
and, by premultiplying by [1 — A(L)L]~!, we obtain 
x, = [1 - A(D)L] !e, (5.74) 


Denote the determinant of [1 — A(L)L] by the expression D. It should not 
take too long to convince yourself that (5.74) can be written as 


[e] _1 e (YL. At | H 


Z D| A (ML 1-4ADL 


err 


or, using the definitions of the A;(L), we get 


Ay, = 1 1- Za (k) [1 Ea (k) L+ elr 
Zt z D Zaz; (WL! 1 = xa, 1 (k)L*! E 
where the summations run from k = 0 to infinity. 

Thus, the solution for Ay, in terms of the current and lagged values of {e,,} 
and {e,,} is 


foe) 


Ay, = 5 { i - $ an (© pm TEDY T (5.75) 


k=0 k=0 
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Now e, and e, can be replaced by (5.69) and (5.70). Making these substi- 


tutions, the restriction that the {€,,} sequence has no long-run effect on y, is 


i DA al cen + Y aL ey, (Oe, = 0 


k=0 k=0 


RESTRICTION 4 


For all possible realizations of the {€,,} sequence, €,, shocks will have only tem- 
porary effects on the Ay, sequence (and on y, itself) if 


i - >, an o] cı; (0) + £ ai2(k)c2, (0) = 0 
k=0 


k=0 


With this fourth restriction, there are four equations that can be used to identify 
the unknown values c,,(0), c12(0), c21(0), and c,,(0). To summarize, the steps in the 
procedure are as follows: 


STEP 1: Begin by pretesting the two variables for time trends and unit roots. If {,} 
does not have a unit root, there is no reason to proceed with the decom- 
position. Appropriately transform the two variables so that the resulting 
sequences are both /(0). Perform lag length tests to find a reasonable spec- 
ification for the VAR. The residuals of the estimated VAR should pass the 
standard diagnostic checks for white-noise processes (of course, e;, and ez; 
can be correlated with each other). 

STEP 2: Using the residuals of the estimated VAR, calculate the variance/covariance 
matrix; that is, calculate var(e,), var(e,), cov(e,;, e3). Also, calculate the sums 


p 


p 
1— J, a(k) and Y* ank) 


k=0 k=0 


where p = lag length used to estimate the VAR. 
Use these values to solve the following four equations for c4; (0), c12(0), 
C21 (0), and C(O): 


var(e,) = c11 (07 + c12(0)? 
var(e,) = c31 (0)? + c23(0) 
cov(e;, €2) = c11 (0)c21 (0) + c12(0)c22(0) 
0 = c,,(0)[1 — Ean (k)] + cp (0)Zay9(k) 
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Given these four values c,,(0) and the residuals of the VAR, the entire {€,,} and {€>,} 
sequences can be identified using the formulas’ 


eiti = Cy OE qj + Cyp(OE a; 


and 
ezi = C21 (OE 17; + Cop(OEo,; 


STEP 3: As in a traditional VAR, the identified {€,,} and {€,,} sequences can be used 
to obtain impulse response functions and variance decompositions. The 
difference is that the interpretation of the impulses is straightforward. For 
example, Blanchard and Quah are able to obtain the impulse responses of the 
change in the log of real GNP to a typical supply-side shock. Moreover, it is 
possible to obtain the historical decomposition of each series. For example, 
set all {€,,} shocks equal to zero and use the actual {€5,} series (i.e., use the 
identified values of €,,) to obtain the permanent changes in {y,} as 


o0 


Ay, = > C12(k)Ez-k 
k=0 


The Blanchard and Quah Results 


In their study, Blanchard and Quah (1989) used the first difference of the logarithm 
of real GNP and the level of unemployment. They noted that unemployment exhibits 
an apparent time trend and that there is a slowdown in real growth beginning in the 
mid-1970s. Since there is no obvious way to address these difficult issues, they esti- 
mated four different VARs. Two include a dummy allowing for the change in the rate of 
growth in output, and two include a deterministic time trend in unemployment. Using 
quarterly GNP and unemployment data over the period 195002-198704, they esti- 
mated a VAR with eight lags. 

Imposing the restriction that demand-side shocks have no long-run effect on real 
GNP, Blanchard and Quah identified the two types of shocks. The impulse response 
functions for the four VARs are quite similar: 


m The time paths of demand-side disturbances on output and unemployment 
are hump shaped. The impulse responses are mirror images of each other; 
initially, output increases while unemployment decreases. The effects peak 
after four quarters; afterward, they converge to their original levels. 

m Supply-side disturbances have a cumulative effect on output. A supply distur- 
bance having a positive effect on output has a small positive initial effect on 
unemployment. After this initial increase, unemployment steadily decreases 
and the cumulated change becomes negative after four quarters. Unemploy- 
ment remains below its long-run level for nearly 5 years. 


Blanchard and Quah found that the alternative methods of treating the slowdown in 
output growth and the trend in unemployment affect the variance decompositions. Since 
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the goal here is to illustrate the technique, consider only the variance decomposition 
using a dummy variable for the decline in output growth and detrended unemployment. 


Percent of Forecast Error Variance Due to Demand-Side Shocks 


Horizon (Quarters) Output Unemployment 
1 99.0 51.9 
4 97.9 80.2 
12 67.6 86.2 
40 39.3 85.6 


At short-run horizons, the huge preponderance of the variation in output is due 
to demand-side innovations. Demand shocks account for almost all of the movement 
in GNP at short horizons. Since demand shock effects are necessarily temporary, the 
findings contradict those of Beveridge and Nelson. The proportion of the forecast error 
variance falls steadily as the forecast horizon increases; the proportion converges to 
zero since these effects are temporary. Consequently, the contribution of supply-side 
innovations to real GNP movements increases at longer forecasting horizons. On the 
other hand, demand-side shocks generally account for increasing proportions of the 
variation in unemployment at longer forecasting horizons. 


14. DECOMPOSING REAL AND NOMINAL 
EXCHANGE RATES: AN EXAMPLE 


In the study by Enders and Lee (1997), we decomposed real and nominal exchange 
rate movements into the components induced by real and nominal factors. This section 
presents a small portion of the paper in order to further illustrate the methodology of 
the Blanchard and Quah technique. The results reported below are updated through 
2013Q1 using the data in the file labeled EXRATES.XLS. One aim of the study is to 
explain the deviations from purchasing power parity. As in Chapter 4, the real value of 
the Canadian dollar (r,) can be defined as 


r,=€,+P; — Pi 


where p* and p, refer to the logarithms of U.S. and Canadian wholesale price indices, 
respectively, and e, is the logarithm of the Canadian/U.S. dollar nominal exchange rate. 

To explain the deviations from PPP, we suppose that there are two types of shocks: 
areal shock and a nominal shock. The theory suggests that real shocks can cause perma- 
nent changes in the real exchange rate but that nominal shocks can cause only temporary 
movements in the real rate. For example, in the long run, if Canada doubles its nominal 
money supply, the Canadian price level and the exchange rate will both double (1.e., p, 
and e, will double). Hence, in the long run, the real exchange rate remains invariant to 
a money supply shock. 
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For Step 1, we perform various unit root tests on the quarterly Canadian/U.S. 
dollar real and nominal exchange rates over the 197301-201301 period. Consistent 
with other studies focusing on the post-Bretton Woods period, it is clear that real 
and nominal rates can be characterized by nonstationary processes. If you follow the 
general-to-specific approach and use a single lag of Ar, in an augmented Dickey—Fuller 
test, you should find that the coefficient on r,_, is —0.063 with a t-statistic of —2.59. 
Rejecting the null hypothesis of a unit root in the real exchange rate is important; if the 
{r,} series is stationary, it has no permanent component. Although many researchers 
argue that nominal exchange rates should act as (1) processes, it is worthwhile to for- 
mally test this claim using an ADF test. Again, follow the general-to-specific method- 
ology and estimate a model with one-lagged change of Ae, to obtain 


Ae, = 0.005 — 0.025Ae,_, + 0.345Ae,_, 
(1.48) (-1.76) (4.59) 


As such, it seems reasonable to proceed treating the {r,} and {e,} series as [(1) 
processes. The BMA model has the form: 


Ar| _ |Cu@) Cy) |En 
Ae, Call) Co(L)} lEn 
where £€„ and €,, represent the zero-mean mutually uncorrelated real and nominal 
shocks, respectively. 
The restriction that the nominal shocks have no long-run effect on the real 


exchange rate is represented by the restriction that the coefficients in C,,(L) sum to 
zero; thus, if C,(k) is the kth coefficient in CD, as in (5.67), the restriction is 


$ cenk = 0 (5.76) 
k=0 

The restriction in (5.76) implies that the cumulative effect of €,,, on Ar, is zero and, 
consequently, that the long-run effect of £, on the level of r, itself is zero. Put another 
way, the nominal shock €, has only short-run effects on the real exchange rate. Note 
that there is no restriction on the effects of a real shock on the real rate or effects of 
either real or nominal shocks on the nominal exchange rate. 

For Step 2, we estimate a bivariate VAR model for several lag lengths. Likelihood 
ratio tests indicate that a VAR model with three lags is appropriate. For example, if you 
compare the three lag and one lag models you should find that In(|23|) = —16.934, 
In(|X,|) = —16.823, the number of coefficients in each equation of the three lag model 
is 7, and the number of usable observations is 157. Using these values, (5.44) becomes 


(157 — 7)*[-16.823 — (—16.934)] = 16.63 


If you compare this value to a y? distribution with 8 degrees of freedom, you will 
find that the restriction is binding at the 0.034 significance level. As such, we would 
conclude that the three-lag model is appropriate. The AIC and SBC with three lags are 
—2630.64 and —2587.85, whereas the AIC and SBC with one lag are —2629.23 and 
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—2610.90, respectively. As such, the AIC also selects the three-lag model, whereas the 
SBC selects the model with only one lag. 

Since the lag length selection methods give conflicting answers, a careful 
researcher might want to perform the analysis using both lag lengths. For ease of 
exposition, the text reports results using only the three-lag model. You can use the data 
in the file EXRATES.XLS to see if the key results are dependent on the lag length. 

The variance decompositions using a standard Choleski decomposition are 
shown in the second and third columns of the table below. The ordering is such 
that the nominal exchange rate has no contemporaneous effect on the real rate. The 
decompositions using the Blanchard—Quah decomposition are given in the fourth 
and fifth columns. The table shows the percentages of the forecast error variances 
accounted for by the £, shock. 


Comparison of Choleski and BQ Decompositions 


Choleski Blanchard—Quah 
Horizon Ar, Ae, Ar, Ae, 
One quarter 100.0 73.93 88.31 40.11 
Four quarters 94.69 73.16 83.36 42.26 


Eight quarters 94.61 73.06 83.91 42.19 


If we use the Choleski decomposition, it is immediately evident that real shocks 
explain almost all of the forecast error variance of the real exchange rate at any forecast 
horizon. Nominal shocks accounted for approximately 26% of the forecast error vari- 
ance of the nominal exchange rate. The interpretation is that real shocks are responsible 
for movements in real and nominal exchange rates. As such, we should expect both 
rates to display sizeable comovements. The effect of using the BQ decomposition is 
such that real shocks explain a smaller proportion of the forecast error variance of both 
exchange rates. This is particularly true for the nominal exchange rate. 

Figure 5.10 shows the impulse response functions of the real and nominal exchange 
rates to both types of shocks. For clarity, the results are shown for the levels of exchange 
rates (as opposed to first differences) measured in terms of standard deviations. For 
example, the standardized responses of the real exchange rate are obtained by dividing 
each reap exchange rate response by the standard deviation of the residuals from real 
exchange rate equation. 


1. Consider a real shock that creates an increase in the relative demand for the 
U.S. good. The effect of such a “real” shock is to cause an immediate increase 
in the real and nominal exchange rates. It is interesting to note that the initial 
movements in the real value of the dollar are greater than those of the nominal 
dollar. Moreover, these changes are all of a permanent nature. Real and nom- 
inal rates converge to their new long-run levels in about seven quarters. Since 
the long-run change in r, and e, is nearly identical, the implication is that the 
price ratio p, — př shows very little response to a real shock. 
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Responses to the Real Shock 
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FIGURE 5.10 Responses of Real and Nominal Exchange Rates 


2. In response to a nominal shock (such as a relative increase in the Canadian 
money supply), the movement in the nominal exchange rate to its long-run 
level is almost immediate. There is only mild evidence of nominal exchange 
rate overshooting — after one period the nominal rate rises from about 0.8 to 
1.0 and, then, returns to its new long-run level. As required by our identifi- 
cation restriction, the effect of a nominal shock on the real exchange rate is 
necessarily temporary. Nevertheless, even the short-run changes in the real 
rate show very little response to a nominal shock—the maximal change is 
only 0.4 standard deviations. The implication is that p, — py adjusts to offset 
the change in the nominal exchange rate. 


Limitations of the Technique 


A problem with this type of decomposition is that there are many types of shocks. As 
recognized by Blanchard and Quah (1989), the approach is limited by its ability to 
identify at most only as many types of distinct shocks as there are variables. Blanchard 
and Quah proved several propositions that are somewhat helpful when the presence 
of three or more structural shocks is suspected. Suppose that there are several distur- 
bances having permanent effects but only one having a temporary effect on {y,}. If 
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the variance of one type of permanent disturbance grows “arbitrarily” small relative to 
the other, then the decomposition scheme approaches the correct decomposition. The 
second proposition they proved is that if there are multiple permanent disturbances 
(temporary disturbances), the correct decomposition is possible if and only if the indi- 
vidual distributed lag responses in the real and nominal exchange rate are sufficiently 
similar across equations. By “sufficiently similar,” Blanchard and Quah mean that the 
coefficients may differ up to a scalar lag distribution. Yet, both propositions essen- 
tially imply that there are only two types of disturbances. For the first proposition, the 
third disturbance must be arbitrarily small. For the second proposition, the third dis- 
turbance must have a sufficiently similar path to one of the others. It is wise to avoid 
such a decomposition when the presence of three or more important disturbances is 
suspected. Alternatively, as in the study by Clarida and Gali (1994), you might be able 
to develop a model implying three long-run restrictions among three variables. 

A second problem is that the Blanchard—Quah restrictions produce a system of 
quadratic equations so that the signs of the c;(0) are not identified. Moreover, in a 
system with many variables, there can be many solutions to the nonlinear system 
of equations. In these circumstances, Taylor (2003) recommends the use of overi- 
dentifying restrictions or those normalizations that are consistent with an underlying 
economic model. 


15. SUMMARY AND CONCLUSIONS 


Intervention analysis was used to determine the effects of installing metal detectors in 
airports. More generally, intervention analysis can be used to ascertain how any deter- 
ministic function affects an economic time series. Usually, the shape of the intervention 
function is clear, as in the metal detector example. However, there are a wide variety of 
possible intervention functions. If there is an ambiguity, the shape of the intervention 
function can be determined using the standard Box—Jenkins criteria for model selec- 
tion. The crucial assumption in intervention analysis is that the intervention function 
has only deterministic components. 

Transfer function analysis is appropriate if the “intervention” sequence is stochas- 
tic. If {y,} is endogeneous and {z,} is exogeneous, a transfer function can be fit using the 
five-step procedure discussed in Section 2. The procedure is a straightforward modifica- 
tion of the standard Box—Jenkins methodology. Similarly, ADLs are a straightforward 
way to capture the time path of an independent variable or a dependent variable. The 
resulting impulse response function traces out the time path of {z,} realizations on the 
{y,} sequence. This technique was illustrated by a study showing that terrorist attacks 
caused Italy’s tourism revenues to decline by a total 600 million SDR. 

With economic data, it is not always clear that one variable is dependent and the 
others are independent. In the presence of feedback, intervention and transfer func- 
tion analyses are inappropriate. Instead, use a vector autoregression, which treats all 
variables as jointly endogeneous. Each variable is allowed to depend on its past real- 
izations and on the past realizations of all other variables in the system. There is no 
special attention paid to parsimony since the imposition of the “incredible identifica- 
tion restrictions” may be inconsistent with economic theory. Granger causality tests, 
block exogeneity, and lag length tests can help select a more parsimonious model. 
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Ordinary least squares yield efficient estimates of the VAR coefficients. One dif- 
ficulty with VAR analysis is that the underlying structural model cannot be recovered 
from estimated VAR. An arbitrary Choleski decomposition provides an extra equation 
necessary for identification of the structural model. For each variable in the system, 
innovation accounting techniques can be used to ascertain: (i) the percentage of the 
forecast error variance attributable to each of the other variables and (ii) the impulse 
responses to the various innovations. This technique was illustrated by examining the 
relationship between domestic and transnational terrorism. 

An important development is the convergence of traditional economic theory and 
the VAR framework. Structural VARs impose an economic model on the contempora- 
neous movements of the variables. As such, they allow for the identification of the 
parameters of the economic model and the structural shocks. The Bernanke—Sims 
procedure can be used to identify (or overidentify) the structural innovations. The Blan- 
chard and Quah methodology imposes long-run restrictions on the impulse response 
functions to exactly identify the structural innovations. An especially useful feature of 
the technique is that it provides a unique decomposition of an economic time series 
into its temporary and permanent components. 

Nevertheless, as summarized in an interesting paper by Todd (1990), VAR results 
may not be robust to reasonable changes in the model’s specification. Sometimes, the 
addition of a time trend, changing the lag length, eliminating a variable from the model, 
or changing the frequency of the data from monthly to quarterly can alter the results of 
a VAR. Similarly, using several plausible ways to measure a variable (e.g., using one 
short-term interest rate instead of another) might lead to different impulse responses or 
variance decompositions. As such, you need to be careful in estimating a VAR. Some 
suggestions are as follows: 


1. Select your variables carefully. Use the variables that most accurately mea- 
sure the phenomena of interest. Moreover, incorporating extraneous variables 
will quickly consume degrees of freedom. Omitting important variables will 
not allow you to interpret your impulse responses and variance decomposi- 
tions properly. 

2. You should have some idea as to whether or not the variables in question are 
stationary, trend stationary, or difference stationary. Granger causality tests 
can be meaningless if they involve nonstationary variables. You do not want 
to include a time trend unless the variables actually contain a determinis- 
tic trend. Moreover, the impulse response functions involving nonstationary 
variables can have very large standard errors. 


3. Be sure to perform robustness checks. Todd (1990), for example, checked 
the robustness of Sims’ results using three different measures of the money 
supply and two different interest rate series. He also obtained results with 
and without a trend. The point is to try a number of reasonable specifications. 
Compare several different performance measures of the alternative specifica- 
tions (such as fit, impulse responses, and variance decompositions). Maintain 
a healthy skepticism of any conclusions if the results from the alternative 
estimations are very different. 
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QUESTIONS AND EXERCISES 


1. Consider three forms of the intervention variable: 


pulse: z, = 1 and all other z; = 0 
pure jump: z; = z, = ... = | and all other z; = 0 fori > 10 
prolonged impulse: z; = 1; z, = 0.75; z} = 0.5; z, = 0.25; and all other values of z, = 0 


a. Show how each of the following {y,} sequences responds to the three types of interven- 
tions: 


i y, =0.5y 1 +z +, 

ii. y, = —O.5y,_, +z +E, 

iii. y, = 1.25y,_, — 0.5y 3 +z +£, 
iv. y, =y 1 +Z +E, 

v. y, =0.75y,_, + 0.25y,_, +z +E, 


b. Notice that the intervention models in iv and v have unit roots. Show that the intervention 
variable z; = 1, z, = —1, and all other values of z; = 0 has only a temporary effect on 
these two sequences. 

c. Show that an intervention variable will not have a permanent effect on a unit root process 
if all values of z; sum to zero. 

d. Discuss the plausible models you might choose if the {y, } sequence is: 

i. stationary and you suspect that the intervention has a growing and then a diminishing 
effect. 

ii. nonstationary and you suspect that the intervention has a permanent effect on the 
level of {y,}. 

iii. nonstationary and you suspect that the intervention has a temporary effect on the 
level of the {y,}. 

iv. nonstationary and you suspect that the intervention increases the trend growth 
of {y,}. 


2. Former KGB General Sakharovsky has been quoted as saying, “In today’s world, 

when nuclear arms have made military force obsolete, terrorism should become our 

main weapon.” Now, most analysts believe that the end of the Cold War brought about 

a dramatic decline in state-sponsored terrorism. The data set TERRORISM.XLS 

contains the quarterly values of various types of domestic and transnational ter- 

rorist incidents over the 197001-201004 period. The precise definition of the 

variables is discussed in Enders, Sandler, and Gaibulloev (2011). If you exam- 

ine Figure 5.1, you can see that the number of both types of incidents begin to 

fall in the early 1990s as a result of the breakup of the Soviet Union in 199104. 

There is a second decline after 199704. The U.S. State department attributes this 

decline to diplomatic and law enforcement measures making it harder for terrorists 

to operate. 

a. Let {y,} denote the quarterly number of transnational incidents. The first step in 
estimating an intervention model is to examine the ACF and PACF of the {y,} series 
for the 197001-199704 period and try to identify a plausible set of models. Since 
the data after 199704 contains 52 observations, it is also reasonable to examine the 
ACF and PACF for the 199801—201004 period. What models for {y,} seem most 
promising? 

b. Jennifer created the dummy variable z, to represent the decline in transnational terror- 
ism series. Specifically, she let z, = 1 after 199704 and z, = 0 for t < 199704. She then 
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estimated the two models: 


y, = 29.09 — 14.70z, and y, = 9.10 + 0.323y,, + 0.374y,_, — 5.00z, 
(26.37) (-1.96) (4.15) (4.39) (5.15) (-2.74) 


Estimate the two models and determine which seems to be the most satisfactory. 


. Justin, who never liked to take advice, ignored Step 1 and simply looked at the ACF and 


PACF for the entire sample period. Why might Justin conclude that the y, series is very 
persistent? 


. Justin thought that an ARMA(1, 1) model could adequately capture the apparent persis- 


tence of the {y,} series. He estimated 


y, = 30.77 + 0.87y,, — 16.41z, — 0.5le,, +£, 
(9.68) (16.20) (-3.51) (5.11) 


In what important ways are Jennifer’s and Justin’s findings for the long-run effects of z, 
on y, quite different? 


. How do the results change if you use a single dummy for 199104? What is the effect of 


including the two dummy variables? 


3. Let the realized value of the {z,} sequence be such that z, = 1 and all other values of z; = 0. 


a. 
b. 


Use equation (5.11) to trace out the effects of the {z,} sequence on the time path of y,. 
Use equation (5.12) to trace out the effects of the {z,} sequence on the time paths of y, 
and Ay,. 


. Use equation (5.13) to trace out the effects of the {z,} sequence on the time paths of y, 


and Ay,. 


. Assume that {z,} is a white-noise process with a variance equal to unity. 


i. Use (5.11) to derive the cross-correlogram between {z,} and {y,}. 
ii. Use (5.12) to derive the cross-correlogram between {z,} and {Ay,}. 
iii. Use (5.13) to derive the cross-correlogram between {z,} and {Ay,}. 


4. Consider the transfer function model y, = 0.5y,_, +z, + £, where z, is the autoregressive 
process z, = 0.5z,_; + Ez- 


a. 
b. 


Derive the cross-correlations between the filtered {y,} sequence and the {€,,} sequence. 
Now suppose y, = 0.5y,_; + z, + 0.5z,_, + £, and z, = 0.5z,_, + €,,. Derive the standard- 
ized cross-covariances between the filtered {y,} sequence and e.,. Show that the first 
and second cross-covariances are proportional to the cross-correlations. Show that the 
cross-covariances decay at the rate 0.5. 


5. Use the data on the file ITALY.XLS to estimate a model in the form of (5.9) using 
p=n=6. 


a. 


C. 


Show that the sample F-value for the restriction a, = c, = 0 is 0.09 with a prob-value of 
0.91. As such, reestimate the model with five lags of each variable and show that it is not 
possible to reject the null hypothesis a; = c; = 0. Show that it is also reasonable to pare 

down the model by restricting cy = c, = c, = 0. 


. Estimate the restricted model from part b using data only from 1972:04 to 1988:04. As 


measured by the AIC and SBC (AIC = —56.05 and SBC = —38.66), you should find 
that this model does not fit, as well as that in (5.15). As opposed to (5.15), show that this 
pared-down model indicates that terrorism increases tourism. 

Explain why this methodology does not fare as well as that described in the text. 


6. Use (5.28) to find the appropriate second-order stochastic difference equation for y,. 


z| 0.8 0.2) |y + Ci 
Z% 0.2 0.8] |z; ez 
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a. Determine whether the {y,} sequence is stationary. 

b. Discuss the shape of the impulse response function of y, to a one-unit shock in e,, and to 
a one-unit shock in e,,. 

c. Suppose e,, = €,, + 0.5é,, and that e,, = £. Discuss the shape of the impulse response 
function of y, to a one-unit shock in E, Repeat for a one-unit shock in £, 

d. Suppose e,, = €,, and that e,, = 0. 5e, = £ Discuss the shape of the Sarpuilse response 
function of y, to a one-unit shock in Ey _, Repeat for a one-unit shock in €., 

e. Use your answers to c and d to explain why the ordering in a Choleski decoiposinoni is 
important. 

f. Using the notation in (5.27), find A? and Al. Does A} appear to approach zero (1.e., the 
null matrix)? 

. Using the notation of (5.20) and (5.21), suppose: ay) = 0, a» = 0, a), = 0.8, a), = 0.2, 

a, = 0.4, and a,, = 0.1. 

a. Find the appropriate second-order stochastic difference equation for y,. Determine 
whether the {y,} sequence is stationary. 

b. Answer parts b through f of Question 6 using these new values of the a,,. 

c. How would the solution for y, change if a,, = 0.2? 


. Suppose the residuals of a VAR are such that var(e,) = 0.75, var(e,) = 0.5, and 

cOv(é),,@5,) = 0.25. 

a. Using (5.55) through (5.58) as guides, show that it is not possible to identify the struc- 
tural VAR without imposing an additional restriction. 

b. Using Choleski decomposition such that b,, = 0, find the identified values of b,,, 
var(€,), and var(e,). 

c. Using Choleski decomposition such that b,, = 0, find the identified values of b,,, 
var(€,), and var(e,). 

d. Using a Sims—Bernanke decomposition such that b,, = 0.5, find the identified values of 
b,, var(é,), and var(e,). 

e. Using a Sims—Bernanke decomposition such that b,, = 0.5, find the identified values of 
by», var(é,), and var(e,). 

f. Suppose that the first three values of e,, are estimated to be 1, 0, —1 and that the first 
three values of e,, are estimated to be —1, 0, 1. Find the first three values of €,, and €,, 
using each of the decompositions in parts b through e. 

. This set of exercises uses data from the file entitled QUARTERLY.XLS in order to estimate 

the dynamic interrelationships among the level of industrial production, the unemployment 

rate, and interest rates. In Chapter 2, you created the interest rate spread (s,) as the differ- 
ence between the 10-year rate and the T-bill rate. Now, create the logarithmic change in the 
index of industrial production (indprod) as Alip, = In(indprod,) — \n(indprod,_,) and the 
difference in the unemployment rate as Aur, = unemp,—unemp,_,. 

a. Estimate the three-variable VAR using nine lags of each variable and a constant and save 
the residuals. Explain why the estimation cannot be beginning earlier than 1962Q3. 
What are the potential advantages of using the variables Alip, and Aur, instead of ip, 
and ur,? 

b. Verify that In(|X,|) = —14.68 and (assuming normality) that the log of the likelihood 
function is 622.32. Calculate the multivariate AIC and SBC using the formulas AIC = 
TIn(|=]) + 2N and SBC = TIn(|=]) + NIn(7). Calculate the multivariate AIC and SBC 
using the formulas AIC* = —2In(L)/T + 2n/T and SBC* = —2In(L)/T + nIn(T)/T 

c. Estimate the model using three lags of each variable and save the residuals. Show that 
the AIC selects the nine-lag model and that the SBC selects the three-lag model. Show 
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10. 


11. 


that the same ambiguity applies to the AIC” and SBC”. Why is it important to estimate 
the three-variable VAR beginning with 196203? 

d. Construct the likelihood ratio test for the null hypothesis of nine lags against the alterna- 
tive of three lags. How many restrictions are there in the system? How many regressors 
are there in each of the unrestricted equations? If you answer correctly, you should find 
that the calculated value y? with 54 degrees of freedom is 98.10, with a significance level 
less than 0.001. Hence, the restriction of three lags is binding. 

e. Begin with a maximum lag length of 12. Show that the general-to-specific method selects 
nine lags, the AIC selects three lags, and the BIC selects one lag. 

Question 9 indicates that a 3-lag VAR seems reasonable for the variables Alip,, Aur,, and 

s,. Estimate the three- VAR beginning in 1961Q1 and use the ordering such that Alip, is 

causally prior to Aur, and that Aur, is causally prior to s,. 

a. If you perform a test to determine whether s, Grange causes Alip, you should find that 
the F-statistic is 2.44 with a prob-value of 0.065. How do you interpret this result? 

b. Verify that s, Granger causes Aunemp,. You should find that the F-statistic is 5.93 with a 
prob-value of less than 0.001. 

c. It turns out that the correlation coefficient between e,, and e,, is —0.72. The correlation 
between e,, and e,, is —0.11 and between e,, and e,, is 0.10. Explain why the ordering in 
a Choleski decomposition is likely to be important for obtaining the impulse responses. 

d. Verify that the forecast error variance decompositions are: 


Proportion due to Proportion due to Proportion due to 
Alip, shock (%) Aur, shock (%) S; shock (%) 
Horizon Alip, Aur, Si Alip, Aur, Si Alip, Aur, Sı 
1 100.00 51.27 1.13 0.00 48.73 0.08 0.00 0.00 98.79 
4 96.18 64.64 9.44 1.47 32.79 0.99 2.35 2.58 89.58 
8 90.83 57.13 19.99 2.38 29.24 0.97 6.78 13.66 79.04 


e. Now estimate the model using the levels of lip, and ur,. Do you now find a lag length of 
5 is appropriate? Compare the forecast error variances to those above. 

f. Obtain the impulse response functions from the model using Alip,, Aur,, and s,. Show 
that a positive shock to industrial production induces a decline in the unemployment 
rate that lasts for six quarters. Then, Aur, overshoots its long-run level before returning 
to zero. 

g. Reverse the ordering and explain why the results depend on whether or not Alip, pro- 
ceeds Aur,. 

This set of exercises uses data from the file entitled QUARTERLY.XLS in order to estimate 

the dynamic effects of aggregate demand and supply shocks on industrial production and 

the inflation rate. Create the logarithmic change in the index of industrial production (ind- 
prod) as Alip, = \n(indprod,) — \n(indprod,_,) and the inflation rate (as measured by the 

CPI) as inf, = log(cpi,) — log(cpi,_,). 

a. Determine whether Alip, and inf, are stationary. 

b. Estimate the two-variable VAR using three lags of each variable and a constant and 
save the residuals. Verify that the three-lag specification is selected by the SBC and the 
general-to-specific method, whereas the AIC selects five lags. 
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c. Perform the Granger causality tests. Verify that the F-statistic for the test that inflation 


Granger-causes industrial production is 4.82 (with a significance level of 0.003) and 
that F-statistic for the test that industrial production Granger-inflation is 5.1050 (with a 
significance level of 0.002). 


. Now use a Choleski decomposition such that Alip, is causally prior to inf,. Verify that the 


variance decompositions are: 


Proportion due to Proportion due to 
Alip, shock (%) inf, shock (%) 
Horizon Alip, inf, Alip, inf, 
1 100.00 1.69 0.00 98.31 
4 97.47 11.21 2.53 88.79 
8 91.05 15.31 8.96 84.69 


. Verify that a positive shock to industrial production acts to increase inflation and that a 


positive inflation shock decreases industrial production. Does this make sense in terms of 
the standard aggregate supply/aggregate demand model? 


. Now use the Blanchard—Quah decomposition such that the aggregate demand shock 


has no long-run effect on industrial production. Verify that the cumulated sums of the 
impulse responses are as shown in Figure 5.11. [Note that the responses have been cumu- 
lated and each has been standardized. For example, the two-step response industrial 
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12. 


production is the sum responses for Alip,,, + Alip,. Moreover, each response has been 
divided by the standard deviation of the residual from the equation for Alip,.] 

g. Does it make economic sense that (1) an aggregate supply shock increases output and 
decreases inflation whereas (ii) an aggregate demand shock increases inflation and 
short-run output and (iii) an aggregate demand shock has no effect on output in the long 
run? 

Jennifer estimates a structural VAR using output (y), money (m), and inflation (i) such that 

the contemporaneous relationships among the variables are: 


ep 1 0 0 fle, 
Cm | = | Ez 1 83 || Em 
0 0 Lye, 


it 
where e, €m» and e, are the regression residual from the y,, m,, and i, equations, and Ev 
Emp and £, are the pure shocks (i.e., the structural innovations) to y,, m,, and i,, respectively. 
a. Is this set of economic restrictions plausible? 
b. Explain why the system is overidentified and how the overidentified system can be esti- 
mated. 
c. Given that the system is overidentified, discuss an overidentifying restriction you might 
want Jennifer to test. How can the test be performed? 


CHAPTER 6 


COINTEGRATION AND ERROR- 
CORRECTION MODELS 


Learning Objectives 


1. 


10. 


11. 


Introduce the basic concept of cointegration and show that it applies in a 
variety of economic models. 


Show that cointegration necessitates that the stochastic trends of nonstation- 
ary variables be linked. 


Consider the dynamic paths of cointegrated variables. Since the trends of the 
variables are linked, the dynamic paths of such variables must respond to the 
current deviation from the equilibrium relationship. 


Develop the Engle—Granger cointegration test. The econometric methods 
underlying the test procedures stem from the theory of simultaneous differ- 
ence equations. 


The Engle—Granger method is illustrated using simulated data. 
Illustrate the Engle—Granger method using real exchange rate data. 


Develop the Johansen full-information maximum likelihood cointegration 
test. 


Show how to test restrictions on cointegrating vectors. Discuss inference in 
models with /(1) and /(2) variables. 


Illustrate the Johansen test using simulated data. 

Show how to estimate ADL models using nonstationary variables and 
develop the ADL cointegration test. 

Compare the Engle—Granger, Johansen, and ADL cointegration tests using 
interest rate data. 


This chapter explores an exciting development in econometrics: the estimation of a 
structural equation or a VAR containing nonstationary variables. In univariate models, 
we have seen that a stochastic trend can be removed by differencing. The resulting 
stationary series can be estimated using univariate Box—Jenkins techniques. At one 
time, the conventional wisdom was to generalize this idea and difference all nonsta- 
tionary variables used in a regression analysis. However, the appropriate way to treat 
nonstationary variables is not so straightforward in a multivariate context. It is quite 
possible for there to be a linear combination of integrated variables that is stationary; 
such variables are said to be cointegrated. In the presence of cointegrated variables, 
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it is possible to model the long-run model and the short-run dynamics simultaneously. 
Many economic models entail such cointegrating relationships. 


1. LINEAR COMBINATIONS OF INTEGRATED 
VARIABLES 


Since money demand studies stimulated much of the cointegration literature, we begin 
by considering a simple model of money demand. Theory suggests that individuals 
want to hold a real quantity of money balances, so that the demand for nominal money 
holdings should be proportional to the price level. Moreover, as real income and the 
associated number of transactions increase, individuals will want to hold increased 
money balances. Finally, since the interest rate is the opportunity cost of holding money, 
money demand should be negatively related to the interest rate. In logarithms, an econo- 
metric specification for such an equation can be written as 


m, = Po + Pip; + Boy, + Bar, + e, (6.1) 


where m, = demand for money 
p, = price level 
y, = real income 
r, = interest rate 
e, = Stationary disturbance term 


pi = parameters to be estimated 


and all variables but the interest rate are expressed in logarithms 

The hypothesis that the money market is in equilibrium allows the researcher to 
collect time-series data of the money supply (= money demand if the money market 
always clears), the price level, real income (possibly measured using real GDP), and 
an appropriate short-term interest rate. The behavioral assumptions require that f} = 1, 
pa > 0, and p, < 0; a researcher conducting such a study would certainly want to test 
these parameter restrictions. Be aware that the properties of the unexplained portion 
of the demand for money (i.e., the {e,} sequence) are an integral part of the theory. If 
the theory is to make any sense at all, any deviation in the demand for money must 
necessarily be temporary in nature. Clearly, if e, has a stochastic trend, the errors in the 
model will be cumulative so that deviations from money market equilibrium will not be 
eliminated. Hence, a key assumption of the theory is that the {e,} sequence is stationary. 

The problem confronting the researcher is that real GDP, the money supply, price 
level, and interest rate can all be characterized as nonstationary /(1) variables. As such, 
each variable can meander without any tendency to return to a long-run level. However, 
the theory expressed in (6.1) asserts that there exists a linear combination of these non- 
stationary variables that is stationary! Solving for the error term, we can rewrite (6.1) as 


e, = m, — Bo — PiP: — Poy, — Bsr; (6.2) 
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Since {e,} must be stationary, it follows that the linear combination of integrated 
variables given by the right-hand-side of (6.2) must also be stationary. Thus, the theory 
necessitates that the time paths of the four nonstationary variables {m,}, {p,}, {y,}, and 
{r,} be linked. This example illustrates the crucial insight that has dominated much of 
the macroeconometric literature in recent years: Equilibrium theories involving non- 
stationary variables require the existence of a combination of the variables that is 
stationary. 

The money demand function is just one example of a stationary combination of 
nonstationary variables. Within any equilibrium framework, the deviations from equi- 
librium must be temporary. Other important economic examples involving stationary 
combinations of nonstationary variables include the following: 


1. 


Consumption function theory. A simple version of the permanent income 
hypothesis maintains that total consumption (c,) is the sum of permanent con- 
sumption (c?) and transitory consumption (cf). Since permanent consumption 
is proportional to permanent income (y”), we can let p be the constant of 
proportionality and write c, = py? + c}. Transitory consumption is neces- 
sarily a stationary variable, and both consumption and permanent income are 
reasonably characterized as /(1) variables. As such, the permanent income 
hypothesis requires that the linear combination of two /(1) variables given by 
c, — By? be stationary. 

Unbiased forward rate hypothesis. One form of the efficient market hypoth- 
esis asserts that the forward (or futures) price of an asset should equal the 
expected value of that asset’s spot price in the future. Foreign exchange mar- 
ket efficiency requires that the one-period forward exchange rate equal the 
expectation of the spot rate in the next period. Letting f, denote the log of the 
one-period price of forward exchange in ¢ and s, the log of the spot price of 
foreign exchange in t, the theory asserts that E,s,,, = f,- If this relationship 
fails, speculators can expect to make a pure profit on their trades in the for- 
eign exchange market. If agents’ expectations are rational, the forecast error 
for the spot rate in f+ 1 will have a conditional mean equal to zero, so that 
Sin. — E,Sp41 = E1 where E,€,,, = 0. Combining the two equations yields 
Sit =f; + E1. Since {s,} and {f,} are I(1) variables, the unbiased forward 
rate hypothesis necessitates that there be a linear combination of nonstation- 
ary spot and forward exchange rates that is stationary. 


Commodity market arbitrage and purchasing power parity. Theories of spa- 
tial competition suggest that in the short run, prices of similar products in 
varied markets might differ. However, arbiters will prevent the various prices 
from moving too far apart even if the prices are nonstationary. Similarly, the 
prices of Apple computers and PCs have exhibited sustained declines. Eco- 
nomic theory suggests that these simultaneous declines are related to each 
other since a price discrepancy between these similar products cannot con- 
tinually widen. Also, as we saw in Chapter 4, purchasing power parity places 
restrictions on the movements of nonstationary price levels and exchange 
rates. If e, denotes the log of the price of foreign exchange and p, and př 
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denote, respectively, the logs of domestic and foreign price levels, long-run 
PPP requires that the linear combination e, + p* — p, be stationary. 


All of these examples illustrate the concept of cointegration as introduced by 
Engle and Granger (1987). Their formal analysis begins by considering a set of eco- 
nomic variables in long-run equilibrium when 


BX, te Paar EESE Braănt =0 


Letting and x, denote the vectors (A), Pz, ... >Ê) and ip Xa «+. Xu), the 
system is in long-run equilibrium when fx,=0. The deviation from long-run 
equilibrium— called the equilibrium error—is e,, so that 


e, = Px; 


If the equilibrium is meaningful, it must be the case that the equilibrium error 
process is stationary. In a sense, the use of the term equilibrium is unfortunate because 
economic theorists and econometricians use the term in different ways. Economic theo- 
rists usually use the term to refer to an equality between desired and actual transactions. 
The econometric use of the term makes reference to any long-run relationship among 
nonstationary variables. Cointegration does not require that the long-run relationship 
be generated by market forces or by the behavioral rules of individuals. In Engle and 
Granger’s use of the term, the equilibrium relationship may be causal, behavioral, 
or simply a reduced-form relationship among similarly trending variables. Engle and 
Granger (1987) provide the following definition of cointegration: 

The components of the vector x, = (Xip Xz --- »X,;,)' are said to be cointegrated of 
order d, b, denoted by x, ~ CI(d, b) if 


1. All components of x, are integrated of order d. 


2. There exists a vector J = (p1, P2, ... » Pn) such that the linear combination 
BX, = Bix, + BoXo, ++ ++ + B,X;7 18 integrated of order (d — b) where b > 0. 
Note that the vector £ is called the cointegrating vector.! 


In terms of equation (6.1), if the money supply, price level, real income, and interest 
rate are all /(1) and the linear combination m, — Po — Pip; — Bry; — B37; = e; is station- 
ary, then the variables are cointegrated of order (1, 1). The vector x, is (m, 1, Pp Yp r) 
and the cointegrating vector f is (1, —po, —P1, —22; —f3). The deviation from long-run 
money market equilibrium is e,; since {e,} is stationary, this deviation is temporary in 
nature. 

There are four important points to note about the definition: 


1. Cointegration typically refers to a linear combination of nonstationary vari- 
ables. Theoretically, it is quite possible that nonlinear long-run relationships 
exist among a set of integrated variables. However, as discussed in Chapter 7, 
the current state of econometric practice is just beginning to allow for tests of 
nonlinear cointegrating relationships. Also note that the cointegrating vector 
is not unique. If (24, f2, ... , Pa) is acointegrating vector, then for any nonzero 
value of A, (AB, AP>, ... , APh) is also a cointegrating vector. Typically, one 
of the variables is used to normalize the cointegrating vector by fixing its 
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coefficient at unity. To normalize the cointegrating vector with respect to x,,, 
simply select A = 1/24. 

From Engle and Granger’s original definition, cointegration refers to vari- 
ables that are integrated of the same order. Of course, this does not imply that 
all integrated variables are cointegrated; usually, a set of J(d) variables is not 
cointegrated. Such a lack of cointegration implies no long-run equilibrium 
among the variables, so that they can wander arbitrarily far from each other. If 
two variables are integrated of different orders, they cannot be cointegrated. 
Suppose xj, is 1(d,) and x, is (dy) where d, > d. Question 7 at the end of 
this chapter asks you to prove that any linear combination of x,, and xz; is 
I(d)). 

Nevertheless, it is possible to find equilibrium relationships among 
groups of variables that are integrated of different orders. Suppose that x,, 
and x», are /(2) and that the other variables under consideration are /(1). As 
such, there cannot be a cointegrating relationship between x,, (or x>,) and x3, 
However, if xı, and x5, are C/(2,1), there exists a linear combination of the 
form fx; + Box, which is /(1). It is possible that this combination of x,, and 
Xz; is cointegrated with the /(1) variables. Lee and Granger (1990) use the 
term multicointegration to refer to this type of circumstance. 

There may be more than one independent cointegrating vectors for a set of 
(1) variables. The number of cointegrating vectors is called the cointegrat- 
ing rank of x,. For example, suppose that the monetary authorities followed a 
feedback rule such that they decreased the money supply when nominal GDP 
was high and increased the nominal money supply when nominal GDP was 
low. This feedback rule might be represented by 


m, = Yo — 110; + Py) + et 
= Yo — M19 — YıPı + er (6.3) 


where {e,,} = a stationary error in the money supply feedback rule. 

Given the money demand function in (6.1), there are two cointegrating 
vectors for the money supply, price level, real income, and the interest rate. 
Let f be the (2 - 5) matrix: 


1 =o -fi -b2 -b 
Yo N Yı 0 

The two linear combinations given by f/x, are stationary. As such, the 
cointegrating rank of x, is two. As a practical matter, if multiple cointegrating 
vectors are found, it may not be possible to identify the behavioral relation- 
ships from what may be reduced-form relationships. As shown below, if x, 
has n nonstationary components, there may be as many as n — | linearly inde- 
pendent cointegrating vectors. Hence, if x, contains only two variables, there 
can be at most one independent cointegrating vector. 
Most of the cointegration literature focuses on the case in which each 
variable contains a single unit root. The reason is that traditional regression 
or time-series analysis applies when variables are /(0) and few economic 
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variables are integrated of an order higher than unity. When it is unambigu- 
ous, many authors use the term cointegration to refer to the case in which 
variables are C/(1, 1). 


Worksheet 6.1 illustrates some of the important properties of cointegration rela- 
tionships. In Case 1, both the {y,} and {z,} sequences were constructed so as to be ran- 
dom walk plus noise processes. Although the 20 realizations shown generally decline, 
extending the sample would eliminate this tendency. In any event, neither series shows 
any tendency to return to a long-run level, and formal Dickey—Fuller tests are not able 


worKsHeEET 6, 7 


ILLUSTRATING COINTEGRATED SYSTEMS 


CASE 1: The series { H,} is a random walk process and {€,,} and {€,,} are white noise. 
Hence, the {y,} and {z,} sequences are both random walk plus noise processes. Although 
each is nonstationary, the two sequences have the same stochastic trend; hence they are 
cointegrated such that the linear combination (y, — z,) is stationary. The equilibrium error 
term (€,, — €,,) is an [(0) process. 


Vt= Htt Eyt Zt= Htt Ezt The equilibrium error: y; — Z; 
0 5 10 15 20 
Wia T T T 
=, Yt 

= = 

Kn 5 5 0 
33) 

-4 | 


CASE 2: All three sequences are random walk plus noise processes. As constructed no 
two are cointegrated. However, the linear combination (y, + z, — w,) is stationary; hence, 
the three variables are cointegrated. The equilibrium error is an /(0) process. 


Yt = Myt + Eyt Zt= Matt Ezy Wt= Hwt +t €wt The equilibrium error: y;+ Zp- w; 


LM 
--- Yr 


17 
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to reject the null hypothesis of a unit root in either series. Although each series is non- 
stationary, you can see that they do move together. In fact, the difference between the 
series (y, — z,) shown in the second graph—is stationary; the equilibrium error term 
e, = (y, — z) has a zero mean and a constant variance. 

Case 2 illustrates cointegration among three random walk plus noise processes. 
As in Case 1, no series exhibits a tendency to return to a long-run level, and formal 
Dickey—Fuller tests are not able to reject the null hypothesis of a unit root in any of the 
three. In contrast to the previous case, no two of the series appear to be cointegrated; 
each series seems to “meander” away from the other two. However, as shown in the 
second graph, there exists a stationary linear combination of the three such that e, = 
Yy, + z, — w, Thus, it follows that the dynamic behavior of at least one variable must be 
restricted by the values of the other variables in the system. 

Figure 6.1 displays the information of Case 1 in a scatter plot of {y,} against 
the associated value of {z,}; each of the 20 points represents the ordered pairs 
O1 z1) (25 Za), «+» » O20; Z20). Comparing Worksheet 6.1 and Figure 6.1, you can see 
that low values in the {y, } sequence are associated with low values in the {z,} sequence 
and that values near zero in one series are associated with values near zero in the other. 
Since both series move together over time, there is a positive relationship between 
the two. The least-squares line in the scatter plot reveals this to be a strong positive 
association. In fact, this line is the “long-run” equilibrium relationship between the 
series, and the deviations from the line are the stationary deviations from long-run 
equilibrium. 


Values of z 


-3 l l l l l 


-3 -2.5 -2 -1.5 —1 -0.5 0 
Values of y 


The scatter plot was drawn using the ty} and {z} 
sequences from Case 1 of Worksheet 6.1. Since 
both series decline over time, there appears to 
be a positive relationship between the two. The 
equilibrium regression line is shown. 


FIGURE 6.1 Scatter Plot of Cointegrated Variables 
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For comparison purposes, Panel (a) in Worksheet 6.2 shows the time paths of two 


random walk plus noise processes that are not cointegrated. Each seems to meander 
without any tendency to approach the other. The scatter plot shown in Panel (b) confirms 
the impression of no long-run relationship between the variables. The deviations from 
the straight line showing the regression of z, on y, are substantial. Plotting the regression 
residuals against time [see Panel (c)], suggests that the regression residuals are not 


stationary. 


WORKSHEET 6,2 


NONINTEGRATED VARIABLES 


The {y,} and {z,} sequences are constructed to independent random walk plus noise 
processes. There is no cointegrating relationship between the two variables. As shown 
in graph (a), both seem to meander without any tendency to come together. Graph (b) 
shows the scatter plot of the two sequences and the regression line z, = By + B,y,. How- 
ever, this regression line is spurious. As shown in graph (c), the regression residuals are 


nonstationary. 
, Regression of z and y; 
Yt = Hyt + Eyt Zt = Hat + Ezt 
37 +; 
+ + 


Panel (a) Panel (b) 


Regression residuals 


Panel (c) 
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2. COINTEGRATION AND COMMON TRENDS 


Stock and Watson’s (1988) observation that cointegrated variables share common 
stochastic trends provides a very useful way to understand cointegration relationships. 
For ease of exposition, return to the case in which the vector x, contains only two vari- 
ables so that x, = (y,,z,)’. Ignoring cyclical and seasonal terms, we can write each vari- 
able as a random walk plus an irregular (but not necessarily a white noise) component: 


Yi = Hyt F Eyt (6.4) 
Zt = Ha F Cy (6.5) 


where yi = a random walk process representing the stochastic trend in variable i 


e;, = the stationary (irregular) component of variable i 


If {y,} and {z,} are cointegrated of order (1,1), there must be nonzero values of p; 
and fp, for which the linear combination fy, + 2z; is stationary. Consider the sum 


By, + Boz, = By Myr aa eye) + Balla + ez) 
= (Pi My, + Paha) + (Brey, + Paez) (6.6) 


For piy, + fz, to be stationary, the term (A4 Hyt + Bo4,,) must vanish. After all, if either 
of the two trends appears in (6.6), the linear combination f,y, + pəz, will also have a 
trend. Since the second term within parentheses is stationary, the necessary and suffi- 
cient condition for {y,} and {z,} to be CI(1, 1) is 


Bi My, + Po =0 (6.7) 


Clearly, 4, and u, are variables whose realized values will be continually changing 
over time. Since we preclude both f; and p, from being equal to zero, it follows that 
(6.7) holds for all ¢ if and only if 


Hyt = —PrU/B, 


For nonzero values of f; and p, the only way to ensure the equality is for the stochastic 
trends to be identical up to a scalar. Thus, up to the scalar — p, / p1, two I(1) stochastic 
processes {y,} and {z,} must have the same stochastic trend if they are cointegrated of 
order (1, 1). 

Return your attention to Worksheet 6.1. In Case 1, the {y,} and {z,} sequences 
were constructed so as to satisfy 


Yi = My + Eyr 
Z= Mp t Ey 
H= H1 tE, 


where €,,, €; and €, are independently distributed white-noise disturbances. 

By construction, 4, is a pure random walk process representing the same stochastic 
trend for both the {y,} and {z,} sequences. The value of #ọ was initialized to zero, and 
three sets of 20 random numbers were drawn to represent the {£}, {€,,}, and {£;} 


sequences. Using these realizations and the initial value of jo, the {y,}, {z,}, and {u} 
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sequences were constructed. As you can clearly determine, subtracting the realized 
value of z, from y, results in a stationary sequence: 


Yı — Z = (m, + Eye) — (M, + Ey) = Eyt — Ect 


To state the point using Engle and Granger’s terminology, multiplying the cointe- 
grating vector f = (1, —1) by the vector by x, = (y, z,)’ yields the stationary sequence 
E, = E — €,,. Indeed, the equilibrium error term shown in the second graph of Work- 
sheet 6.1 has all the hallmarks of a stationary process. The essential insight of Stock and 
Watson (1988) is that the parameters of the cointegrating vector must be such that they 
purge the trend from the linear combination. Any other linear combination of the two 
variables contains a trend so that the cointegrating vector is unique up to a normalizing 
scalar. Hence, fy, + p4z, cannot be stationary unless 23/24 = f/f. 

Recall that Case 2 of Worksheet 6.1 illustrates cointegration between three random 
walk plus noise processes. Each process is /(1), and Dickey—Fuller unit root tests would 
not be able to reject the null hypothesis that each contains a unit root. As you can see 
in the lower portion of Worksheet 6.1, no pairwise combination of the series appears 
to be cointegrated. Each series seems to meander, and, as opposed to Case 1, no one 
single series appears to remain close to any other series. However, by construction, the 
trend in w, is the simple summation of the trends in y, and z;: 


Hwt = Hyt + Ha 


Here, the vector x, = (y,, Z, w;)' has the cointegrating vector (1, 1, —1), so that the linear 
combination y, + z, — w, is stationary. Consider 


Y, +t w = (Hy, + Ey) + (Hy F Ez) ~ (yt + Evy) = Ey FE — Em 


This example illustrates the general point that cointegration will occur whenever 
the trend in one variable can be expressed as a linear combination of the trends in the 
other variable(s). In such circumstances it is always possible to find a vector p such that 
the linear combination f)y, + Boz, + Paw, does not contain a trend. The result easily 
generalizes to the case of n variables. Consider the vector representation: 


xX, =u, + e, (6.8) 
where x, = the vector (Xip Xp «++ > Xp)” 
u, = the vector of stochastic trends (Hp Hop +++ » Hn)’ 


e, =ann- | vector of stationary components 


If one trend can be expressed as a linear combination of the other trends in the 
system, it means that there exists a vector p such that 


Bey + BoM, Prest Bnhnt =0 


Premultiply (6.8) by this set of J;s to obtain 


Px, = Bu, + Be, 
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Since Pu, = 0, it follows that Bx, = pe,. Hence, the linear combination px, is sta- 
tionary. As shown in Section 8, this argument easily generalizes to the case of multiple 
cointegrating vectors. 


3. COINTEGRATION AND ERROR CORRECTION 


A principal feature of cointegrated variables is that their time paths are influenced by 
the extent of any deviation from long-run equilibrium. After all, if the system is to return 
to long-run equilibrium, the movements of at least some of the variables must respond 
to the magnitude of the disequilibrium. Before proceeding further, be aware that we 
will be examining the time paths of multiple nonstationary time-series variables. To do 
so in a tractable way, we will need to draw relationship between the rank of a matrix 
and its characteristic roots. The required mathematics are provided in Appendix 6.1. 

The relationship between long-term and short-term interest rates illustrates how 
variables might adjust to any discrepancies from the long-run equilibrium relationship. 
Clearly, the theory of the term structure of interest rates implies a long-run relationship 
between long- and short-term rates. If the gap between the long- and short-term rates 
is “large” relative to the long-run relationship, the short-term rate must ultimately rise 
relative to the long-term rate. Of course, the gap can be closed by (1) an increase in the 
short-term rate and/or a decrease in the long-term rate, (2) an increase in the long-term 
rate but acommensurately larger rise in the short-term rate, or (3) a fall in the long-term 
rate but a smaller fall in the short-term rate. Without a full dynamic specification of the 
model, it is not possible to determine which of the possibilities will occur. Neverthe- 
less, the short-run dynamics must be influenced by the deviation from the long-run 
relationship. 

The dynamic model implied by this discussion is one of error correction. In an 
error-correction model, the short-term dynamics of the variables in the system are influ- 
enced by the deviation from equilibrium. If we assume that both interest rates are /(1), a 
simple error-correction model that could apply to the term structure of interest rates is 


Ars, = as(Ty-1 — Bri) + Est as >0 (6.9) 
Ary = ALt- — Prsi) Eu =a, > 9 (6.10) 


where £s, and £z, are white-noise disturbance terms which may be correlated, rz, and 
rs; are the long- and short-term interest rates, and as, «z, and f are parameters. 

As specified, the short- and long-term interest rates change in response to stochas- 
tic shocks (represented by £ s; and €,,) and in response to the previous period’s deviation 
from long-run equilibrium. Everything else being equal, if this deviation happened to 
be positive (so that r;,_, — Brs,_,; > 0), the short-term interest rate would rise and the 
long-term rate would fall. Long-run equilibrium is attained when rz, = Prg, so that the 
expected change in each rate is zero. 

Here you can see the relationship between error-correcting models and cointe- 
grated variables. By assumption, Arg is stationary so that the left-hand side of (6.9) is 
(0). For (6.9) to be sensible, the right-hand side must be /(0) as well. Given that €¢, is 
stationary, it follows that the linear combination r;,_, — f7s;; must also be stationary; 
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hence, the two interest rates must be cointegrated with the cointegrating vector (1, —/). 
Of course, the identical argument applies to (6.10). The essential point to note is that 
the error-correction representation necessitates that the two variables be cointegrated 
of order CI(1, 1). This result is unaltered if we formulate a more general model by 
introducing the lagged changes of each rate into both equations: 


Ary, = yg + @s(rr -1 — Brgy) + Èa Ci) Argi + Lay Arp; + Ey, (6.11) 
Ary, = 0 — 4, (Ty-1 — Brg) + Zaz Ci) Argi + Lay () Ary; + Eu (6.12) 


Again, Eş, Ezp and all terms involving Ar,,_; and Ar,,_; are stationary. Thus, the linear 
combination of interest rates r;,_; — Prs,_; must also be stationary. 

Inspection of (6.11) and (6.12) reveals a striking similarity to the VAR models of 
the previous chapter. This two-variable error-correction model is a bivariate VAR in first 
differences augmented by the error-correction terms a(r;,;_, — Prsy;_1) and a; (74-1 — 
Brs,_). Notice that a; and a, have the interpretation of speed of adjustment parame- 
ters. The larger a, is, the greater the response of rç, to the previous period’s deviation 
from long-run equilibrium. At the opposite extreme, very small values of a, imply that 
the short-term interest rate is unresponsive to last period’s equilibrium error. For the 
{Ars} sequence to be unaffected by the long-term interest rate sequence, ay and all 
the a;ı2(i) coefficients must be equal to zero. Of course, at least one of the speed of 
adjustment terms in (6.11) and (6.12) must be nonzero. If both a, and a, are equal to 
zero, the long-run equilibrium relationship does not appear and the model is not one of 
error correction or cointegration. 

The result can easily be generalized to the n-variable model. Formally, the (n - 1) 
vector of /(1) variables x, = (Xir X2 --- »X,,)/ has an error-correction representation if 
it can be expressed in the form: 


AX, = To + AX] + 1 AX] + WAX. + +++ +2, Ax,_, tHE, (6.13) 


where zy = an (n - 1) vector of intercept terms with elements z; 
a; =(n-n) coefficient matrices with elements TCi) 
z = a matrix with elements z;, such that one or more of the z # 0 


€, = an (n- 1) vector with elements £; 


Note that the disturbance terms are such that £; may be correlated with £; 

Let all variables in x, be /(1). Now, if there is an error-correction representation 
of these variables as in (6.13), there is necessarily a linear combination of the /(1) 
variables that is stationary. Solving (6.13) for zx,_, yields 


TX, 1 = Ax, — To — UA, Ax,_; E; 


Since each expression on the right-hand side is stationary, 7x,_,; must also be 
stationary. Since æ contains only constants, each row of z is a cointegrating vector 
of x,. For example, the first row can be written as (71 )X4;~1 + T12X21-1 Fe + Wy Xp): 
Since each series is (1), (%11; 212, --- , 2 ,,) must be a cointegrating vector for x,. 
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The key feature in (6.13) is the presence of the matrix z. There are two important 
points to note: 


1. Ifall elements of z equal zero, (6.13) is a traditional VAR in first differences. 
In such circumstances there is no error-correction representation since Ax, 
does not respond to the previous period’s deviation from long-run equilib- 
rium. 

2. If one or more of the z differs from zero, Ax, responds to the previous 
period’s deviation from long-run equilibrium. Hence, estimating x, as a VAR 
in first differences is inappropriate if x, has an error-correction representa- 
tion. The omission of the expression zx,_, entails a misspecification error if 
x, has an error-correction representation as in (6.13). 


A good way to examine the relationship between cointegration and error correction 
is to study the properties of the simple VAR model: 
Vp = A Vp + 412-1 F Eyr (6.14) 
Zp = Ag Yp-1 + 492%-1 + Ez (6.15) 
where €,, and £, are white-noise disturbances that may be correlated with each other 
and, for simplicity, intercept terms have been ignored. Using lag operators, we can write 
(6.14) and (6.15) as 
(1 — a, ,L)y, — ay Lz, = Eyt 
—a Ly; + (a = Ay L)z; = Ezt 
The next step is to solve for y, and z,. Writing the system in matrix form, we obtain 
(1 -= aL) =aL | [Yi] _ |Ev 
=al (l= anL)| [z Ext 
Using Cramer’s Rule or matrix inversion, we can obtain the solutions for y, and z, as 
7 (1 — dy L)éy, + ayo Le, 
A-a DA = agg) - apat? 
Ley, + d = ay Le. 
(1 = ay LL = agg) = anant? 
We have converted the two-variable first-order system represented by (6.14) and 
(6.15) into two univariate second-order difference equations of the type examined in 
Chapter 2. Note that both variables have the same inverse characteristic equation: (1 — 
ay L1 = aL) = gail. Setting el = ay D1 = aL) = etl? = 0 and solving 
for L yields the two roots of the inverse characteristic equation. In order to work with 


the characteristic roots (as opposed to the inverse characteristic roots), define A = 1/L 
and write the characteristic equation as 


Yt (6.16) 


(6.17) 


Zt 


A? — (ay, ta At (aja — aaa = 0 (6.18) 


Since the two variables have the same characteristic equation, the characteristic 
roots of (6.18) determine the time paths of both variables. The following remarks sum- 
marize the time paths of {y,} and {z,}: 
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1. If both characteristic roots (A,, A) lie inside the unit circle, (6.16) and (6.17) 
yield stable solutions for {y,} and {z,}. If tis sufficiently large or if the ini- 
tial conditions are such that the homogeneous solution is zero, the stability 
condition guarantees that the variables are stationary. The variables cannot be 
cointegrated of order (1, 1) since each is stationary. 

2. If either root lies outside the unit circle, the solutions are explosive. Neither 
variable is difference stationary, so they cannot be C/(1, 1). In the same way, 
if both characteristic roots are unity, the second difference of each variable 
will be stationary. Since each is /(2), the variables cannot be C/(1, 1). 

3. As you can see from (6.14) and (6.15), if aj. = a), = 0, the solution is trivial. 
For {y,} and {z,} to be unit root processes, it is necessary for a); = dy) = 1. 
It follows that A, = A, = 1 and that the two variables evolve without any 
long-run equilibrium relationship; hence, the variables cannot be cointe- 
grated. 

4. For {y,} and {z,} to be CI(1, 1), it is necessary for one characteristic root to 
be unity and the other to be less than unity in absolute value. In this instance, 
each variable will have the same stochastic trend and the first difference of 
each variable will be stationary. For example, if A, = 1, (6.16) will have the 
form: 

y, = [0 - ay Ley, + ay Le.1/[CL -DA — 4D) 


or, multiplying by (1 — L), we get 
(1 — Ly, = Ay, = [CL — ag Ley, + aypLey|]/( — Mb) 
which is stationary if |A,| < 1. 


Thus, to ensure that the variables are C/(1,1), we must set one of the characteristic 
roots equal to unity and the other to a value that is less than unity in absolute value. For 
the larger of the two roots to equal unity, the quadratic formula indicates that 


0.5(ay + ayy) + 0.54/ (aj, + G55) — 2a, ayy + 4ay pay, = 1 


so that after some simplification, the coefficients are seen to satisfy” 
aj; = [C1 — apy) — a12471]/(1 — a22) (6.19) 


Now consider the second characteristic root. Since a, and/or aj, must differ from 
zero if the variables are cointegrated, the condition |A,| < 1 requires 


and 
aaz + (ayy) < 1 (6.21) 


Equations (6.19), (6.20), and (6.21) are restrictions we must place on the coeffi- 
cients of (6.14) and (6.15) if we want to ensure that the variables are cointegrated of 
order (1, 1). To see how these coefficient restrictions bear on the nature of the solution, 
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write (6.14) and (6.15) as 


| k mol a12 ie le 
= aa ) (6.22) 
| Az, a21 an — 1) [2-1 Ezt 
Now, (6.19) implies that a}, — 1 = —a;,a),/(1 — ay) so that after a bit of manipula- 
tion, (6.22) can be written in the form 
Ay, = [412421 /(1 = a22)lY;-1 + 41221-1 + Eyr (6.23) 
Az, = aY — (l = a22)Z1 + Ez (6.24) 
Equations (6.23) and (6.24) form an error-correction model. If both a, and a>, dif- 
fer from zero, we can normalize the cointegrating vector with respect to either variable. 
Normalizing with respect to y,, we get 
Ay, = AY = Pz-1) + Eyt 
AZ, = aO — BR) + Ex 


where a, = —d)d5,/(1 — ay) 
B=(1 - dy9)/a9, 
a, = ay) 


You can see that y, and z, change in response to the previous period’s deviation from 
the long-run equilibrium y,_, — £z,_,. If y,_; = 6z,_1, y, and z, change only in response 
to €,, and £, shocks. Moreover, if æ, < 0 and a, > 0, y, decreases and z, increases in 
response to a positive deviation from long-run equilibrium. You should also be able to 
convince yourself that conditions (6.20) and (6.21) ensure that p 4 0 and that at least 
one of the speed of adjustment parameters (i.e., æ, and a,) is not equal to zero. Now, 
refer to (6.9) and (6.10); you can see this model is in exactly the same form as the 
interest rate example presented in the beginning of this section. 

Although a, and a; cannot both equal zero, an interesting special case arises if 
one of these coefficients is zero. For example, if we set a4 = 0, the speed of adjustment 
coefficient a, equals zero. In this case, y, changes only in response to €,, as Ay, = En 
The {z,} sequence does all of the correction to eliminate any deviation from long-run 
equilibrium. Since {y,} does not do any of the error-correcting, {y,} is said to be weakly 
exogenous. 

To highlight some of the important implications of this simple model, we have 
shown the following: 


1. The restrictions necessary to ensure that the variables are CI(1, 1) guaran- 
tee that an error-correction model exists. In our example, both {y,} and {z,} 
are unit root processes but the linear combination y, — fz, is stationary; the 
normalized cointegrating vector is [1, —(1 — ay7)/a 1]. The variables have 
an error-correction representation with speed of adjustment coefficients a, = 
—a;2421/(1 — a22) and a, = a>. It was also shown that an error-correction 
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model for /(1) variables necessarily implies cointegration. This finding illus- 
trates the Granger representation theorem stating that for any set of /(1) 
variables, error correction and cointegration are equivalent representations. 


2. A cointegration necessitates coefficient restrictions in a VAR model. It is 
important to realize that a cointegrated system can be viewed as a restricted 
form of a general VAR model. Let x, = (y, z)” and £, = (€,,, €,,)’ so that we 
can write (6.22) in the form i 


Ax, = AX 1 + &; (6.25) 


Clearly, it is inappropriate to estimate a VAR of cointegrated variables 
using only first differences. Estimating (6.25) without the expression zx, 
would eliminate the error-correction portion of the model. It is also impor- 
tant to note that the rows of z are not linearly independent if the variables are 
cointegrated. Multiplying each element in row 1 by —(1 — ay,)/a, yields 
the corresponding element in row 2. Thus, the determinant of z is equal to 
zero, and y, and z, have the error-correction representation given by (6.23) 
and (6.24). 

This two-variable example illustrates the very important insights of 
Johansen (1988) and Stock and Watson (1988) that we can use the rank of x to 
determine whether or not two variables {y,} and {z,} are cointegrated. Com- 
pare the determinant of z to the characteristic equation given by (6.18). If the 
largest characteristic root equals unity (A; = 1), it follows that the determi- 
nant of z is zero and that z has a rank equal to unity. If z were to have a rank 
of zero, it would be necessary for a}; = 1, day) = 1, and a, = a, = 0. The 
VAR represented by (6.14) and (6.15) would be nothing more than Ay, = £, 
and Az, = €,,. In this case, both the {y,} and {z,} sequences are unit root pro- 
cesses without any cointegrating vector. Finally, if the rank of z is full, then 
neither characteristic root can be unity, so the {y,} and {z,} sequences are 
jointly stationary. After all, if there are two independent stationary relations 
for {y,} and {z,}, both variables must be stationary. 

3. In general, both variables in a cointegrated system will respond to a deviation 
from long-run equilibrium. However, it is possible that one (but not both) of 
the speed of adjustment parameters is zero. For example, if a, = 0, {y,} does 
not respond to the discrepancy from long-run equilibrium and {z,} does all 
of the adjustment. In this circumstance, {y,} is weakly exogenous because it 
does none of the error correction. As such, an econometric model for {z,} can 
be estimated and hypothesis testing can be conducted without reference to a 
specific model for {y,}. Section 10 and Appendix 6.2 consider modeling in a 
cointegrated system when a variable is weakly exogenous. 

Also, it is necessary to reinterpret Granger causality in a cointegrated 
system. In a cointegrated system, {y,} does not Granger cause {z,} if lagged 
values Ay,_; do not enter the Az, equation and if z, does not respond to the 
deviation from long-run equilibrium. Hence, {z,} must be weakly exogenous. 
If a, = 0 in (6.24), {z,} is weakly exogenous and is not Granger caused by 
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{y,}. Similarly, in the cointegrated system of (6.11) and (6.12), {r;,} does not 
Granger cause {rs} if all a,.(i) = 0 and if a, = 0. 


The n-Variable Case 


Little is altered in the n-variable case. The relationship between cointegration, error 
correction, and the rank of the matrix z is invariant to adding additional variables to 
the system. The interesting feature introduced in the n-variable case is the possibility 
of multiple cointegrating vectors. Now consider a more general version of (6.25): 


X, =AsX_1 +E; (6.26) 
where x, = the (n- 1) vector (Xip Xop <--> Xp)” 
€, = the (n - 1) vector (E1 Ezp =- > Enp) 


A, =an (n : n) matrix of parameters 


Subtracting x,_; from each side of (6.26) and letting J be an (n - n) identity matrix, 
we get 


Ax, = —(I — A1 )x 1 +E; 
=TX 1 +E, (6.27) 


where z is the (n - n) matrix —(/ — A,) and Tij denotes the element in row į and column 
j of z. As you can see, (6.27) is a special case of (6.13) such that all z; = 0. 

Again, the crucial issue for cointegration concerns the rank of the (n - n) matrix m. 
The only way for the rank of a matrix to be zero is for each of its elements to be zero. 
Hence, if the rank of z is zero, each element of z must equal zero so that there are no 
cointegrating vectors. In this instance, (6.27) is equivalent to an n-variable VAR in first 
differences: 

Ax, = £, 


Here, each Ax; = €;, so that all the {x;,} sequences are unit root processes and 
there is no linear combination of the variables that is stationary. 

At the other extreme, suppose that z is of full rank. The long-run solution to (6.27) 
is given by the n-independent equations: 


MX yp + Hy yXpy + HygXoy +++ + 1, Xy =O 


My X14 + TX + HygXay + +++ + Apă =O 


My X yp + Hy gXq + Hyg kay t+ ++ + Ray Xpy =O. (6.28) 


nn nt = 

Each of these n equations is an independent restriction on the long-run solution of 

the variables; the n variables in the system face n long-run constraints. In this case, each 
of the n variables contained in the vector x, must be stationary with the long-run values 
given by the solution to 6.28. The variables cannot be C/(1, 1) since all are stationary. 
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In intermediate cases, in which the rank of z is equal to r < n, there are r cointe- 
grating vectors. With r independent equations and n variables, there are n — r stochastic 
trends in the system. If r = 1, there is a single cointegrating vector given by any row 
of the matrix æ. Each {Ax;,} sequence can be written in error-correction form. For 
example, we can write Ax,, as 


AXi = T11Xi1 + TX Ho + Hy Xni + Ey 
or, normalizing with respect to x,_,, we can set a, = 74; and f,; = z,;/7, to obtain 
Axi = Xi + PiX H+ + BinXm—1) + Err (6.29) 
In the long run, the {x;,} will satisfy the relationship 
Xir + PiX +++ + Bink = 9 


Hence, the normalized cointegrating vector is (1, J12, $13» --- » Pin) and the speed 
of adjustment parameter is a,. In the same way, with two cointegration vectors the 
long-run values of the variables will satisfy the two relationships 


Hy Xp + TX +++ + AyyXy =O 


My {Xp + TX +++ + + Hy, Xpy =O 


which can be appropriately normalized. 

The main point here is that there are three important ways to test for cointegra- 
tion. The Engle—Granger methodology seeks to determine whether the residuals of 
the equilibrium relationship are stationary. The Johansen (1988) methodology deter- 
mines the rank of z and the error-correction method examines the speed of adjustment 
coefficients. The Engle—Granger approach is the subject of the next three sections. 
Sections 7 through 9 examine the Johansen (1988) methodology, and testing within the 
error-correction framework is examined in Section 10. 


4. TESTING FOR COINTEGRATION: 
THE ENGLE-GRANGER METHODOLOGY 


To explain the Engle—Granger testing procedure, let’s begin with the type of problem 
likely to be encountered in applied studies. Suppose that two variables—say y, and 
Z,——are believed to be integrated of order 1 and we want to determine whether there 
exists an equilibrium relationship between the two. Engle and Granger (1987) propose a 
four-step procedure to determine if two /(1) variables are cointegrated of order C/(1, 1). 


STEP 1: Pretest the variables for their order of integration. By definition, cointegra- 
tion necessitates that two variables be integrated of the same order. Thus, the 
first step in the analysis is to pretest each variable to determine its order of 
integration. The augmented Dickey—Fuller tests discussed in Chapter 4 can 
be used to infer the number of unit roots (if any) in each of the variables. If 
both variables are stationary, it is not necessary to proceed since standard 


STEP 2: 
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time-series methods apply to stationary variables. If the variables are inte- 
grated of different orders, it is possible to conclude they are not cointegrated. 
However, as detailed in Section 5, if you have more than two variables such 
that some are /(1) and some are /(2), you may want to determine whether the 
variables are multicointegrated. 

Estimate the long-run equilibrium relationship. If the results of Step 1 indi- 
cate that both {y,} and {z,} are /(1), the next step is to estimate the long-run 
equilibrium relationship in the form 


Ye = Po + Biz, t e, (6.30) 


If the variables are cointegrated, an OLS regression yields a 
“super-consistent” estimator of the cointegrating parameters fọ and 
Pi- Stock (1987) proves that the OLS estimates of fy and p; converge 
faster than they do in OLS models using stationary variables. To explain, 
reexamine the scatter plot shown in Figure 6.1. You can see that the effect of 
the common trend dominates the effect of the stationary component; both 
variables seem to rise and fall in tandem. Hence, there is a strong linear 
relationship as shown by the regression line drawn in the figure. 

In order to determine if the variables are actually cointegrated, denote 
the residual sequence from this equation by {é,}. Thus, the {é,} series con- 
tains the estimated values of the deviations from the long-run relationship. If 
these deviations are found to be stationary, the {y,} and {z,} sequences are 
cointegrated of order (1, 1). It would be convenient if we could perform a 
Dickey—Fuller test on these residuals to determine their order of integration. 
Consider the autoregression of the residuals: 


Ae, =a; +E, (6.31) 


Since the {@,} sequence is a residual from a regression equation (with 
a mean necessarily equal to zero), there is no need to include an intercept 
term; the parameter of interest in (6.31) is a,. If we cannot reject the null 
hypothesis a, = 0, we can conclude that the residual series contains a unit 
root. Hence, we conclude that the {y,} and {z,} sequences are not cointe- 
grated. The more precise wording is awkward because of a triple negative, 
but to be technically correct, if it is not possible to reject the null hypothesis 
a, = 0, we cannot reject the hypothesis that the variables are not cointe- 
grated. Instead, the rejection of the null hypothesis implies that the residual 
sequence is stationary. Given that {y,} and {z,} were both found to be /(1) 
and that the residuals are stationary, we can conclude that the series are coin- 
tegrated of order (1, 1). 

In most applied studies it is not possible to use the Dickey—Fuller tables 
themselves. The problem is that the {@,} sequence is generated from a regres- 
sion equation; the researcher does not know the actual error e,, only the 
estimate of the error é,. The methodology of fitting the regression in (6.30) 
selects values of fp and J, that minimize the sum of squared residuals. Since 
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STEP 3: 


the residual variance is made as small as possible, the procedure is preju- 
diced toward finding a stationary error process in (6.31). Hence, the test 
Statistic used to test the magnitude of a, must reflect this fact. Only if fp 
and #, were known in advance and used to construct the true {e, } sequence 
would an ordinary Dickey—Fuller table be appropriate. When you estimate 
the cointegrating vector, use the critical values provided in Table C in the 
Supplementary Manual. These critical values depend on sample size and the 
number of variables used in the analysis. For example, to test for cointegra- 
tion between two variables using a sample size of 100, the critical value at 
the 5% significance level is —3.398. 

If the residuals of (6.31) do not appear to be white noise, an augmented 
form of the test can be used instead of (6.31). Suppose that diagnostic checks 
indicate that the {€,} sequence of (6.31) exhibits serial correlation. Instead 
of using the results from (6.31), estimate the autoregression: 

n 
Aê, = 4481+ Yay A; +E, (6.32) 
i=1 

Again, if we reject the null hypothesis a, = 0, we can conclude that the 
residual sequence is stationary and that the variables are cointegrated. 
Estimate the error-correction model. If the variables are cointegrated (i.e., 
if the null hypothesis of no cointegration is rejected), the residuals from the 
equilibrium regression can be used to estimate the error-correction model. If 
{y,} and {z,} are CI(1, 1), the variables have the error-correction form: 


Ay, = @ + aly, — Biz + 2 ay (DAY; + 2 a (IAZ,_j + Ey 
i=l i=l 


(6.33) 
Az, = Oy + O-[Y-1 — Biz 11 + 2 Ay, (i)Ay,_; + > Ay (i)AZ i + Ez 
i=l i=l 
(6.34) 


where f, = the parameter of the cointegrating vector given by (6.30), €, 
and €,, = white-noise disturbances (which may be correlated with each 
other), and o1, a, a, @,, &11 (1), @ (i), 41 (i), @y9(V) are all parameters. 
Engle and Granger (1987) propose a clever way to circumvent the 
cross-equation restrictions involved in the direct estimation of (6.33) and 
(6.34). The magnitude of the residual é,_; is the deviation from long-run 
equilibrium in period (t — 1). Hence, it is possible to use the saved residuals 
{é,_,;} obtained in Step 2 as an estimate of the expression y,_; — 6,Z,;-, in 
(6.33) and (6.34). Thus, using the saved residuals from the estimation of the 
long-run equilibrium relationship, estimate the error-correcting model as 


Ay, =a) + aê, + Yay (Ay; + VawWAz-i + Ey (6.35) 
i=] i=1 


Az, = ay + aê, + Yan (DAY; + Yao (Az-i + Ex (6.36) 
i=l i=l 
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Other than the error-correction term @,_;, (6.35) and (6.36) constitute a 
VAR in first differences. This VAR can be estimated using the same method- 
ology developed in Chapter 5. All of the procedures developed for a VAR 
apply to the system represented by the error-correction equations. Notably: 


1. OLS is an efficient estimation strategy since each equation contains the 
same set of regressors. 

2. Since all terms in (6.35) and (6.36) are stationary [i.e., Ay, and its lags, 
Az, and its lags, and é,_, are Z(0)], the test statistics used in traditional 
VAR analysis are appropriate for (6.35) and (6.36). For example, lag 
lengths can be determined using a y?-test, and the restriction that all 
a,,(i) = 0 can be checked using an F-test. If there is a single cointegrating 
vector, restrictions concerning «, or a, can be conducted using a t-test. 

STEP 4: Assess Model Adequacy. There are several procedures that can help deter- 
mine whether the error-correction estimated model is appropriate. 

1. You should be careful to assess the adequacy of the model by per- 
forming diagnostic checks to determine whether the residuals of the 
error-correction equations approximate white noise. If the residuals are 
serially correlated, lag lengths may be too short. Reestimate the model 
using lag lengths that yield serially uncorrelated errors. It may be that you 
need to allow longer lags of some variables than of others. If so, you can 
gain efficiency by estimating the near- VAR using the seemingly unrelated 
regressions (SUR) method. Out of sample forecasting exercises are also 
useful ways to select among alternative models. 

2. The speed of adjustment coefficients a, and a, are of particular interest 
in that they have important implications for the dynamics of the system. 
As shown in Section 3, the values of a, and a, are directly related to the 
characteristic roots of the difference equation system. Direct convergence 
necessitates that a, be negative and a, be positive. If we focus on (6.36) 
it is clear that for any given value of @,_, a large value of a, is associated 
with a large value of Az,. If a, is zero, the change in z, does not at all 
respond to the deviation from long-run equilibrium in (t — 1). If œ, is 
zero and if all a, (i) = 0, then it can be said that {Ay,} does not Granger 
cause { Az,}. We know that a, and/or a, should be significantly different 
from zero if the variables are cointegrated. After all, if both a, and a, are 
zero, there is no error correction and (6.35) and (6.36) comprise nothing 
more than a VAR in first differences. Moreover, the absolute values 
of these speeds of adjustment coefficients must not be too large. The 
point estimates should imply that Ay, and Az, converge to the long-run 
equilibrium relationship. 

If all but one variable is weakly exogenous, you may want to 
estimate that variable using the error-correction technique described in 
Section 10. 

3. As in a traditional VAR analysis, Lutkepohl and Reimers (1992) 
show that innovation accounting (i.e., impulse responses and variance 
decomposition analysis) can be used to obtain information concerning 
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the interactions among the variables. As a practical matter, the two 
innovations €,, and €,, may be contemporaneously correlated if y, has a 
contemporaneous effect on z, and/or if z, has a contemporaneous effect on 
y, In obtaining impulse response functions and variance decompositions, 
some method—such as a Choleski Decomposition—must be used to 
orthogonalize the innovations. 

The shape of the impulse response functions and the results of the 
variance decompositions can indicate whether the dynamic responses of 
the variables conform to theory. Since all variables in (6.35) and (6.36) 
are [(0), the impulse responses of Ay, and Az, should converge to zero. 
You should reexamine your results from each step if you obtain a nonde- 
caying or explosive impulse response function. 


Before closing this section, a word of warning is in order. It is very tempting to 
use f-statistics to perform significance tests on the cointegrating vector. However, you 
must avoid this temptation since, in general, the coefficients do not have an asymptotic 
t-distribution. To explain, suppose you estimate (6.30) so that have a model in the form: 
Y, = Po + Biz, + e, Even if the variables are cointegrated, the {e,} sequence is likely to 
be serially correlated. Moreover, since y, and z, are jointly determined variables, there 
is a simultaneity problem so that {z,} cannot be treated as an “independent” variable. 
There is one case in which the f-statistics are appropriate. Suppose that the cointegration 
relationship between {y,} and {z,} is such that 


Yr = Po + Biz + Err 
Az, = Ez 


where Ee | ,€5, = 0. 

The notation is designed to illustrate the point that the residuals from both 
equations are uncorrelated white-noise disturbances. The set of assumptions is fairly 
restrictive in that the residuals from both equations must be serially uncorrelated and 
the cross-correlations must be zero. If these conditions hold, the OLS estimates of 
Po and p; can be tested using t-tests and F-tests. If the disturbances are not normally 
distributed, the asymptotic results are such that t-tests and F-tests are appropriate. Be 
aware that both conditions are necessary to perform such tests. If Fe, ,€5, 4 0, {z,} is 
not exogenous since shocks to €,, affect z,. Moreover, as in a standard regression, if 
the residuals of the cointegrating vector are serially correlated, inference concerning 
the coefficients is inappropriate. Phillips and Hansen (1990) develop a procedure 
that allows you to construct confidence intervals for the J; in the presence of serial 
correlation and the lack of independence of the {z,} sequence. The details are discussed 
in Appendix 6.2 in the Supplementary Manual. 


5. ILLUSTRATING THE ENGLE-GRANGER 
METHODOLOGY 


Figure 6.2 shows three simulated variables that can be used to illustrate the 
Engle—Granger procedure. Inspection of the figure suggests that each is nonstationary, 
and there is no visual evidence that any pair is cointegrated. As detailed in Table 6.1, 
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FIGURE 6.2 Three Cointegrated Series 


Table 6.1 The Simulated Series 


{yz} {z4} {w,} 
Trend Hyt = Hyt-1 + Eyt Hzt = Hzt-1 + Ezt Hwt = Hyt + Hzt 
Pure Irregular ôy = 0.58,41 + Nyt 64 = 0.564 +14 Owt = 9.56 wt-1 + It 
Series Yi = My + Êy Z, = Uy +t Oy + 0.554 W, = Hm + Êw + 0.554 + 0.56,, 


each series is constructed as the sum of a stochastic trend component plus an 
autoregressive irregular component. 

The first column of the table contains the formulas used to construct the {y,} 
sequence. First, 150 realizations of a white-noise process were drawn to represent the 
{£} sequence. Initializing 44, i 0, 150 values of the random walk process { /4,,} were 
constructed using the formula 4, = My1 + Ey (see the first cell of the table). Another 
150 realizations of a white-noise process were drawn to represent the {7,,} sequence; 
given the initial condition 6,) = 0, these realizations were used to construct {ôy} as 
6,, = 0.58;1 +7, (see the next lower cell). Adding the two constructed series yields 
150 realizations for {y,}. To help ensure randomness, only the last 100 observations are 
used in the simulated study. Hence, {y,} is the sum of a stochastic trend and a stationary 
(i.e., irregular) component. 

The {z,} sequence was constructed in a similar fashion; the {e} and {7,,} 
sequences are each represented by two different sets of 150 random numbers. The 
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trend {y,,} and the autoregressive irregular term {6,,} were constructed as shown 
in the second column of the table. The {6,,} sequence can be thought of as a pure 
irregular component in the {z,} sequence. In order to introduce correlation between 
the {y,} and {z,} sequences, the irregular component in {z,} was constructed as the 
sum: 6,, + 0.56,,. In the third column you can see that the trend in {w,} is the simple 
summation of the trends in the other two series. As such, the three series have the 
cointegrating vector (1, 1,—1). The irregular component in {w,} is the sum of pure 
innovation 6,,, and 50% of the innovations 6,, and 6,,. 

Now pretend that we do not know the data-generating process. The issue is whether 
the Engle—Granger methodology can uncover the essential details of the process. Use 
the data on the file COINT6.XLS to follow along. The first step is to pretest the variables 
in order to determine their order of integration. Consider the augmented Dickey —Fuller 
regression equation for {y,}: 

n 
Ay, = Ao + 1 + Yi AY-i +E, 
i=1 

If the data happened to be quarterly, it would be natural to perform the augmented 
Dickey—Fuller tests using lag lengths that are multiples of 4 (i.e., n = 4,8, ... ). For 
each series, the results of the Dickey—Fuller test and the augmented test using 4 lags 
are reported in Table 6.2. 

With 100 observations and a constant, the 5% critical value for the Dickey—Fuller 
test is —2.89. Since the absolute values of all f-statistics are well below this critical 
value, we cannot reject the null hypothesis of a unit root in any of the series. Of course, 
if there were any serious doubt about the presence of a unit root, we could use the 
procedures in Chapter 4 to test for the presence of the drift term. If various lag lengths 
yield different results, we would want to test for the most appropriate lag length. 

The luxury of using simulated data is that we can avoid these potentially sticky 
problems and move on to Step 2. Since all three variables are presumed to be jointly 
determined, the long-run equilibrium regression can be estimated using either y,, z, 
or w, as the “left-hand-side” variable. The three estimates of the long-run relationship 
(with t-values in parentheses) are 


y, =—0.048 — 0.927z, + 0.977w, + ey 
(—0.58) (—38.10) (53.461) 

z,=0.0590 — 1.01 ly, + 1.026w, + e; 
(—0.67) (—38.10) (65.32) 

w,=—0.085 + 0.990y, + 0.953z, + ew 
(—1.01) (53.46) (65.32) 


where e, €, and e,,, = the residuals from the three equilibrium regressions. 

The essence of the test is to determine whether the residuals from the equilibrium 
regression are stationary. Again, in performing the test, there is no presumption that 
any one of the three residual series is preferable to any of the others. If we use each of 
the three series to estimate an equation in the form of (6.31) [or (6.32)], the estimated 


values of a, are given in Table 6.3. 
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Table 6.2 Estimated a, and the Associated t-statistic 


No Lags 4 Lags 
Ay; —0.020 —0.027 
(-0.742) (-1.047) 
Az, —0.021 —0.258 
(—0.992) (1.144) 
Aw, —0.035 —0.037 
(—1.908) (—1.936) 


Table 6.3 Estimated a, and the Associated t-statistic 


No lags 4 Lags 
i —0.443 —0.595 
yr (—5.175) (—4.074) 
A —0.452 —0.593 
a (—5.379) (—4.226) 
—0.607 
Aem —0.455 (4.225) 
(—5.390) 


From Table C, you can see that the critical values of the t-statistic as —3.828. 
Hence, using any one of the three equilibrium regressions, we can conclude that the 
series are cointegrated of order (1, 1). Fortunately, all three equilibrium regressions 
yield this same conclusion. We should be very wary of a result indicating that the vari- 
ables are cointegrated using one variable for the normalization but are not cointegrated 
using another variable for the normalization. In such circumstances, it is possible that 
only a few of the variables are cointegrated. Suppose that x),, x.,, and x3, are three 
(1) variables and that x,, and x», are cointegrated such that x), — P,x>, is stationary. 
A regression of x,, on the other two variables should yield the stationary relationship 
Xit = PX, + 0x3,. Similarly, a regression of x, on the other variables should yield the 
stationary relationship x», = (1/f5)x,, + 0x3,. However, a regression of x3, on xı; and 
X, Cannot reveal the cointegrating relationship. Nevertheless, the possibility of a con- 
tradictory result is a weakness of the test. 

You must be careful in conducting significance tests on the estimated equilib- 
rium regressions. As mentioned above, the coefficients do not have an asymptotic 
t-distribution unless the right-hand-side variables are actually independent and the error 
terms are serially uncorrelated. From Table 6.1, it should be clear that these assumptions 
are violated by the data generating process. 

Step 3 entails estimating the error-correction model. Consider the first-order sys- 
tem shown with f-statistics in parentheses: 


Ay, = 0.006 + 0.418¢,,_, + 0.178Ay,_, + 0.313Az,_, — 0.368Aw,_; +£, 


(0.19) (2.79) (1.08) (1.94) (-2.27) (6.37) 
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Az,=—0.042 +0.074e,,,) +0.145Ay,_, + 0.262Az,_, —0.313Aw,_, + Ez (6.38) 
(-1.12) (0.42) (0.75) (1.38) (—1.63) i 
Aw, = —0.040 — 0.069e „1 + O.156Ay,_; + 0.301Az,_; — 0.420Aw,1 + Ew 

(—0.90)  (—0.33) (0.68) (1.35) (—1.87) 

(6.39) 
where ep1 = W,_, + 0.0852 — 0.9901ly,_, — 0.9535z,_, so that e,,,, is the lagged 
value of the residual from the equilibrium relationship using w, as the dependent 
variable. 

Equations (6.37) through (6.39) comprise a first-order VAR augmented with the 
single error-correction term e,,,_,. Again, there is an area of ambiguity since the residu- 
als from any of the “equilibrium” relationships could have been used in the estimation. 
The signs of the speed of adjustment coefficients are in accord with convergence toward 
the long-run equilibrium. In response to a positive discrepancy in e,,,_,;, both y, and 
z, tend to increase while w, tends to decrease. The error-correction term, however, is 
significant only in (6.37). 

Finally, the diagnostic methods discussed in the last section should be applied 
to (6.37) through (6.39) in order to assess the model’s adequacy. Using actual data, 
lag-length tests and the properties of the residuals need to be considered. Moreover, 
innovation accounting could help determine whether the model is adequate. Question 
2 at the end of the chapter asks you to perform some of these diagnostics. 


The Engle-Granger Procedure with K2) Variables 


Multicointegration refers to a situation in which a linear combination of /(2) and (1) 
variables is integrated of order zero. For example, suppose that x,, and x», are /(2) and 
that z, is /(1). It is possible that a linear combination of x,, and x}, is (1) and that this 
combination is cointegrated with z,. Hence, it is possible to have a long-run equilibrium 
relationship of the form 


Xir = aX + 04%; 
However, a richer set of possibilities is given by the stationary relationship 
Xp = PX + 1 AX, + 04%; 


This specification allows for the possibility that the linear combination x4; — Xp, 
is (1) and cointegrated with the other /(1) independent variables in the system: Ax,, 
and z,. To make sure you understand the issue, ask yourself if it is possible for p, to be 
zero. The answer is a resounding no. If p) = 0 , the /(2) variable x,, cannot, by itself, 
be cointegrated with the /(1) variables. 

In principle, it is possible to check for multicointegration using a two-step proce- 
dure. First, search for a cointegrating relationship among the /(2) variables and then use 
this relationship to check for a possible cointegrating relationship with the remaining 
I(1) variables. Engsted, Gonzalo and Haldrup (1997) show that this procedure is effec- 
tive only if the cointegrating vector for the first step is known. Otherwise, the second 
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step is contaminated with the errors generated in the first step. In the most general form 
of their one-step procedure, you estimate an equation in the form 


Xit = ag + aıt + wt + PoX>, + P3X3, + Yi Axy; + 77 AXx3, + QiZ + ey (6.40) 


where x),, X>,, and x3, are /(2) variables, z, is a vector of J(1) variables, and the deter- 
ministic regressors can include a quadratic time trend. 

Hence, the test allows you to include up to two /(2) variables and an unrestricted 
number of /(1) variables as regressors. You might want to include the quadratic time 
trend if A*x,, contains a drift. Since the key issue is the stationarity of the {e,} series, 
estimate a regression of the form 


P 
Aê, = pê, + J) pA, + v, 
il 
where {ê,} are the regression residuals from (6.40). 

If it is possible to reject the null hypothesis p = 0, it is possible to conclude that 
there is multicointegration. In addition to sample size, the critical values of the f-statistic 
for the null hypothesis p = 0 depend on the number of /(2) regressors (m, = 1 or 2), the 
number of /(1) regressors (m, = 0 to 4), and the form of the deterministic regressors. 
The critical values are shown in Table D in the Supplementary Manual. Consider the 
U.K. money demand equations for the sample period 1963Q1 to 1989Q2 estimated by 
Haldrup (1994): 


m, = dp + 0.68p, + 1.57y, — 2.67r, — 2.55Ap, (6.41) 


and 
m, = dy + aıt + 0.89p, + 2.39y, — 2.69r, — 3.25Ap, (6.42) 


Pretesting the variables indicated that m, (as measured by the log M1) and p, (the 
log of the implicit price deflator) were /(2) and that y, (the log of total final expenditure) 
and r, (a measure of the interest rate differential) were (1). The only variable needing 
explanation is the presence of Ap, in the money demand function. The idea is to allow 
for the demand for money to depend on the inflation rate (i.e., change in the log of 
the price level) since high inflation should reduce the desire to hold money balances. 
Since there is a total of 105 observations, one /(2) regressor (so that m, = 1), and three 
I(1) regressors, the 5% critical values for models without and with the linear trend are 
—4.56 and —4.91, respectively. Using the residuals from the money demand equations 
given by (6.41) and (6.42), Haldrup found that the f-statistics for the null hypothesis 
p = 0 were —2.35 and —2.66, respectively. Hence, it is possible to conclude that the 
two regressions are spurious (1.e., it is not possible to reject the null hypothesis of no 
multicointegration). 

Even though multicointegration fails, Haldrup goes on to experiment with vari- 
ous estimates of the error-correction mechanism. One interesting model (with standard 
errors in parentheses) is 


A’m, = —0.04é,_, + stationary regressors 
(0.02) 


370 CHAPTER6 COINTEGRATION AND ERROR-CORRECTION MODELS 


where the stationary regressors can include lagged values of A?m, as well as current 
and lagged values of A?p,, Ay,, Ap,, and Ar,. The point estimate is such that A?m, is 
expected to decline in response to a positive discrepancy from the long-run relationship. 
The t-statistic of —0.04/0.02 = 2 suggests that the effect is just significant at the 5% 
level. 


6. COINTEGRATION AND PURCHASING POWER 
PARITY 


To illustrate the Engle—Granger methodology using actual data, reconsider the theory 
of purchasing power parity (PPP). Respectively, if e,, př, and p, denote the logarithms 
of the price of foreign exchange, the foreign price level, and the domestic price level, 
long-run PPP requires that e, + p* — p, be stationary. The unit root tests reported in 
Chapter 4 indicate that real exchange rates (defined as r, = e, + př — p,) appear to be 
nonstationary. Cointegration offers an alternative method to check the theory; if PPP 
holds, the sequence formed by the sum {e, + p*} should be cointegrated with the {p,} 
sequence. Call the constructed dollar value of the foreign price level f,; that is, f, = 
e, + p;. Long-run PPP asserts that there exists a linear combination of the form f, = 
Po + Bip, + a, such that {y,} is stationary and the cointegrating vector is such that 
B= 1. 

As reported in Chapter 4, in Enders (1988), I used price and exchange rate 
data for Germany, Japan, Canada, and the United States for both the Bretton Woods 
(1960-1971) and post-Bretton Woods (1973—1988) periods. Each series was con- 
verted into an index number such that each series was equal to unity at the beginning 
of its respective period (either 1960 or 1973). In the fixed exchange rate period, all 
values of {e,} were set equal to unity. Pretesting the data indicated that for each period, 
the U.S. price level {p,} and the dollar values of the foreign price levels {e, + p¥ } both 
contained a single unit root. With differing orders of integration, it would have been 
possible to immediately conclude that long-run PPP had failed. 

The next step was to estimate the long-run equilibrium relation by regressing each 
Ji = e, + Pp On p, such that 


Ji = Po + PiP: + h, (6.43) 


Absolute PPP asserts f, = p,, so this version of the theory requires fy = 0 and 
Pı = 1. The intercept pọ is consistent with the relative version of PPP, requiring only 
that domestic and foreign price levels are proportional to each other. Unless there are 
compelling reasons to omit the constant, the recommended practice is to include an 
intercept term in the equilibrium regression. In fact, Engle and Granger’s (1987) orig- 
inal Monte Carlo simulations all include intercept terms. 

The estimated values of p; and their associated standard errors are reported in 
Table 6.4. Note that five of the six values are estimated to be quite a bit below unity. 
Be especially careful not to make too much of these findings. It is not appropriate to 
conclude that each value of p; is significantly different from unity simply because the 
values of (1 — f,) exceed two or three standard deviations. It is hard to overstate the 
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Table 6.4 The Equilibrium Regressions 


Germany Japan Canada 
1973-1986 
Estimated £, 0.5374 0.8938 0.7749 
Standard Error (0.0415) (0.0316) (0.0077) 
1960-1971 
Estimated £, 0.6660 0.7361 1.0809 


Standard Error (0.0262) (0.0154) (0.0200) 


point that the assumptions underlying this type of t-test are not applicable because there 
is no presumption that p, is the exogenous variable while f, is the dependent variable, 
or that {,} is white noise. 

The residuals from each regression equation, called {/i,}, were checked for unit 
roots. The unit root tests are straightforward because the residuals from a regression 
equation have a zero mean and do not have a time trend. The following two equations 
were estimated using the residuals from each long-run equilibrium relationship: 


AA, = af +E, (6.44) 
and 


p 
AA, = ay fy + > aiti AM + E; (6.45) 
i=1 

Table 6.5 reports the estimated values of a, from (6.44) and from (6.45) using a lag 
length of four. It bears repeating that failure to reject the null hypothesis a, = 0 means 
we cannot reject the null of no cointegration. Alternatively, if —2 < a, < 0, itis possible 
to conclude that the { /7,} sequence does not have a unit root and that the {f,} and {p,} 
sequences are cointegrated. Also note that it is not appropriate to use the confidence 
intervals reported in Dickey and Fuller. The Dickey—Fuller statistics are inappropriate 
because the residuals used in (6.44) and (6.45) are not the actual error terms. Rather, 
these residuals are estimated error terms that are obtained from the estimate of the 
equilibrium regression. If we knew the magnitudes of the actual errors in each period, 
we could use the Dickey—Fuller tables. 

Under the null hypothesis a, = 0, the critical values for the t-statistic depend on 
sample size. Comparing the results reported in Table 6.5 with the critical values pro- 
vided by Table C indicates that only for Japan during the fixed exchange rate period 
it is possible to reject the null hypothesis of no cointegration. At the 5% significance 
level, the critical value of t is —3.398 for two variables and T = 100. Hence, at the 5% 
significance level we can reject the null of no cointegration (i.e., we accept the alterna- 
tive that the variables are cointegrated) and find in favor of PPP. For the other countries 
in each time period, we cannot reject the null hypothesis of no cointegration and must 
conclude that PPP generally failed. 

The third step in the methodology entails estimation of the error-correction 
model. Only the Japan/U.S. model needs estimation since it is the sole case for which 
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Table 6.5 Dickey—Fuller Tests of the Residuals 


Germany Japan Canada 
1973-1986 
No lags 
Estimated a, —0.0225 —0.0151 —0.1001 
Standard Error (0.0169) (0.0236) (0.0360) 
t-statistic for a, =0 —1.331 —0.640 —2.781 
4 lags 
Estimated a, —0.0316 —0.0522 —0.0983 
Standard Error (0.0170) (0.0236) (0.0388) 
t-statistic for a, =0 —1.859 —2.212 —2.533 
1960-1971 
No lags 
Estimated a, —0.0189 —0.1137 —0.0528 
Standard Error (0.0196) (0.0449) (0.0286) 
t-statistic for a, =0 —0.966 —2.535 —1.846 
4 lags 
Estimated a, —0.0294 —0.1821 —0.0509 
Standard Error (0.0198) (0.0530) (0.0306) 
t-statistic for a, =0 —1.468 —3.437 —1.663 


cointegration holds. The final error-correction models for Japanese and U.S. price 
levels during the 1960-1971 period were estimated to be 


Af, = 0.00119 — 0.10548 fi,_, 
(0.00044) (0.04184) 


Ap, = 0.00156 + 0.01114f,_, 
(0.00033) (0.03175) 


(6.46) 


(6.47) 


where /fi,_, is the lagged residual from the long-run equilibrium regression. Note that 
fi,_, is the estimated value of f,_; — Po — 6,p,_, and that standard errors are in paren- 
theses. 

Lag length tests (see the discussion of y? and F-tests for lag length in Chapter 5) 
indicated that lagged values of Af,_; and Ap,_; did not need to be included in the 
error-correction equations. Note that the point estimates in (6.46) and (6.47) indicate a 
direct convergence to long-run equilibrium. For example, in the presence of a one-unit 
deviation from long-run PPP in period t — 1, the Japanese price level converted into 
dollars falls by 0.10548 units and the U.S. price level rises by 0.01114 units. Both of 
these price changes in period f act to eliminate the positive discrepancy from long-run 
PPP present in period ¢ — 1. 

Notice the discrepancy between the magnitudes of the two speed of adjust- 
ment coefficients; in absolute value, the Japanese coefficient is approximately ten 
times that of the U.S. coefficient. As compared to the Japanese price level, the 
U.S. price level responded only slightly to a deviation from PPP. Moreover, the 
error-correction term is about 1/3 of a standard deviation from zero for the U.S. 
(0.01114/0.03175 = 0.3509) and approximately 2.5 standard deviations from zero 
for Japan (0.10548/0.4184 = 2.5210). Hence, at the 5% significance level, we can 
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conclude that the speed of adjustment term is insignificantly different from zero for the 
United States but not for Japan. This result is consistent with the idea that the United 
States was a large country relative to Japan—movements in U.S. prices evolved 
independently of events in Japan, but movements in exchange rate adjusted Japanese 
prices responded to events in the United States. 

You can update the study using the data contained on the file COINT_PPP.XLS. 
The file contains quarterly values of the U.K., Japanese, and Canadian wholesale prices 
and bilateral exchange rates with the United States. Germany is not included because 
the pre-unification data for Germany is not compatible with the more recent data. The 
file also contains the U.S. wholesale price level. Question 9 at the end of the chapter 
guides you through the process. The data starts in January 1973 and asks you to test 
for PPP by determining whether the three variables p,, e, and př are cointegrated. 


7. CHARACTERISTIC ROOTS, RANK, AND 
COINTEGRATION 


Although the Engle and Granger (1987) procedure is easily implemented, it does 
have several important defects. The estimation of the long-run equilibrium regression 
requires that the researcher place one variable on the left-hand side and use the 
others as regressors. For example, in the case of two variables, it is possible to run 
the Engle—Granger test for cointegration by using the residuals from either of the 
following two “equilibrium” regressions: 


Yi = Bio + But + es (6.48) 


or 
Zt = Boo + Boy, + ex (6.49) 


As the sample size grows infinitely large, asymptotic theory indicates that the test 
for a unit root in the {e,,} sequence becomes equivalent to the test for a unit root in 
the {e,} sequence. Unfortunately, the large sample properties on which this result is 
derived may not be applicable to the sample sizes usually available to economists. 
In practice, it is possible to find that one regression indicates that the variables are 
cointegrated, whereas reversing the order indicates no cointegration. This is a very 
undesirable feature of the procedure because the test for cointegration should be invari- 
ant to the choice of the variable selected for normalization. The problem is obviously 
compounded using three or more variables since any of the variables can be selected as 
the left-hand-side variable. Moreover, in tests using three or more variables, we know 
that there may be more than one cointegrating vector. The method has no systematic 
procedure for the separate estimation of the multiple cointegrating vectors. 

Another defect of the Engle—Granger procedure is that it relies on a two-step esti- 
mator. The first step is to generate the residual series {@,}, and the second step uses 
these generated errors to estimate a regression of the form Aê, = a,é,_, +--+. Thus, 
the coefficient a, is obtained by estimating a regression using the residuals from another 
regression. Hence, any error introduced by the researcher in Step 1 is carried into 
Step 2. Fortunately, several methods have been developed that avoid these problems. 
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The Johansen (1988) and the Stock and Watson (1988) maximum likelihood estimators 
circumvent the use of two-step estimators and can estimate and test for the presence 
of multiple cointegrating vectors. Moreover, these tests allow the researcher to test 
restricted versions of the cointegrating vector(s) and the speed of adjustment parame- 
ters. Often, we want to determine whether it is possible to verify a theory by testing 
restrictions on the magnitudes of the estimated coefficients. 

The Johansen (1988) procedure relies heavily on the relationship between the rank 
of a matrix and its characteristic roots. Appendix 6.1 reviews the essentials of these 
concepts; those of you wanting more details should review this material. For those 
wanting an intuitive explanation, notice that the Johansen procedure is nothing more 
than a multivariate generalization of the Dickey—Fuller test. In the univariate case, it is 
possible to view the stationarity of {y,} as being dependent on the magnitude of a; that 
is, 

Yi = AY HE; 
or 
Ay, = (a; — 1)y,-1 + £; 

If (a; — 1) = 0, the {y,} process has a unit root. Ruling out the case in which {y,} 
is explosive, if (a, — 1) 4 0 we can conclude that the {y,} sequence is stationary. The 
Dickey—Fuller tables provide the appropriate statistics to formally test the null hypoth- 
esis (a, — 1) = 0. Now consider the simple generalization to n variables; as in (6.26), 
let 

xX, = Áx +E; 
so that 
Ax, = AIX 1 X tE; 
= (A -Dx +E, 
= aki + &; (6.50) 
where x, and €, = (n- 1) vectors 
A, = an (n ; n) matrix of parameters 
I = an (n-n) identity matrix 
z is defined to be (A, — Z) 


As indicated in the discussion surrounding (6.27), the rank of (A, — I) equals the 
number of cointegrating vectors. By analogy to the univariate case, if (A, — J) consists 
of all zeroes—so that rank(z) = 0—all of the {x,,} sequences are unit root processes. 
Since there is no linear combination of the {x;} processes that is stationary, the vari- 
ables are not cointegrated. If we rule out characteristic roots that are greater than unity 
and if rank(z) = n, (6.50) represents a convergent system of difference equations, so 
that all variables are stationary. 

There are several ways to generalize (6.50). The equation is easily modified to 
allow for the presence of a drift term; simply let 


Ax, = Ag + 4X,_1 +E, (6.51) 


where Ag = the (n - 1) vector of constants (a9, dog, --- 54,0)! 
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The effect of including the various a; is to allow for the possibility of a linear 
time trend in the data-generating process. You would want to include the drift term if 
the variables exhibited a decided tendency to increase or decrease. Here, the rank of z 
can be viewed as the number of cointegrating relationships existing in the “detrended” 
data. In the long run, zx,_, = 0 so that each {Ax,,} sequence has an expected value 
of aj. Aggregating all such changes over ¢ yields the deterministic expression ajot. 

Figure 6.3 illustrates the effects of including a drift in the data-generating pro- 
cess. Two random sequences with 100 observations each were generated; denote these 
sequences as {€,,} and {£}. Initializing yọ = Zo = 0, we constructed the next 100 val- 
ues of the {y,} and {z,} sequences as 


Ay, _ —0.2 0.2 Y1 A Eyt 
Az, 0.2 —0.2 | | z Ez 
so that the cointegrating relationship is 


~0.2y,_; + 0.2z,_; =0 


or 
Vp = % 


In the top graph (a) of Figure 6.3, you can see that each sequence resembles a 
random walk process and that neither wanders too far from the other. The next graph 
(b) adds drift coefficients such that ajọ = dog = 0.1; now each series tends to increase 
by 0.1 units in each period. In addition to the fact that each sequence shares the same 


Panel (a): No Drift or Intercept Panel (b): Drift Coefficients = (0.1, 0.1) 
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FIGURE 6.3 Drifts and Intercepts in Cointegrating Relationships 
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stochastic trend, note that each also has the same deterministic time trend. The fact that 
each has the same deterministic trend is not a result of the equivalence between aj, 
and a9; since y, and z, are cointegrated, the general solution to (6.51) necessitates that 
each have the same linear trend. For verification, Panel (c) sets ajọ = 0.1 and azg = 0.4. 
Again, the sequences have the same stochastic and deterministic trends. As an aside, 
note that increasing ay, and decreasing ajọ would have an ambiguous effect on the slope 
of the deterministic trend. This point will be important in a moment; by appropriately 
manipulating the elements of Ag it is possible to include a constant in the cointegrating 
vector(s) without imparting a deterministic time trend to the system. 

One way to include a constant in the cointegrating relationships is to restrict the 
values of the various a,ọ. For example, if rank(z) = 1, the rows of z can differ by only 
a scalar, so that it is possible to write each { Ax;,} sequence in (6.51) as 


AX = Hyp Xyyy F MyyXyp-y HH Ay Xy-1 + ao + Ey 


AX, = 84 (H yy Xyp—-1 F HygQXqy_-y Ho + Hiy_Xpy—1) + a0 + Ex 


AXpt = ST X71 + TX H+ + Ann) + an0 + Ent 


where s; = scalars such that 5,7; = 7. 
If the aj can be restricted such that aj = s,a,9, it follows that all of the {Ax;,} 
sequences can be written with the constant included in the cointegrating vector: 


AX, = (Hyp Xyp-1 + Ay QXy_y H+ Ann- + A109) + E1; 


AX = $9(H Xp + Wy gXop-1 H+ + Ap yXpy—-1 + 410) + Ex 


AXpt = Sn T1 Xp + Wy gXop-1 H+ + AnXn-1 + 410) + Ent 


or in compact form, 


Ax,=a*x*  +€ 6.52 
t =l t 
where 
i 
ASEA s Nar) 
* — 1 
Xi] = yp Xan + >Xn-1> 1) 
Til Ti2 =- Tin Mo 
gaj a 72 Tan A20 
Tanl Tn2 Boats Tnn Ano 


The interesting feature of (6.52) is that the linear trend is purged from the system. 
In essence, the various aj) have been altered in such a way that the general solution for 
each {x;,} does not contain a time trend. The solution to the set of difference equations 
represented by (6.52) is such that all Ax;, are expected to equal zero when 7, ;x,,;_) + 
TX] +*+ Hy yXpe—1 + dio = O. 


CHARACTERISTIC ROOTS, RANK, AND COINTEGRATION 377 


To highlight the difference between (6.51) and (6.52), the last graph (d) of 
Figure 6.3 illustrates the consequences of setting ajọ = 0.1 and ay, = —0.1. You can 
see that neither sequence contains a deterministic trend. In fact, for the data shown 
in the figure, the trend will vanish so long as we select values of the drift terms 
maintaining the relationship ajọ = —dz9_ (Question 1 at the end of this chapter will 
help you to demonstrate this result). 

Some econometricians prefer to include an intercept term in the cointegrating vec- 
tor along with a drift term. This makes sense if the variables contain a drift and if 
economic theory suggests that the cointegrating vector contains an intercept. However, 
it should be clear that the intercept in the cointegrating vector is not identified in the 
presence of a drift term. After all, some portion of the unrestricted drift can always 
be included in the cointegration vector. In terms of the example above, the system can 
always be written as 


AX = (Hyp Xqyey F TAa Ho + My X—-1 + P10) + Oy, + Ey 


AXpt = SMX 1p—-1 + A11 HF Hy Xpp—1 + b10) + On, + Ent 


where b; is defined to the value that satisfies 5;b,) + b; = ayo, 

All that was done is to divide a, into two parts and to place one part inside the 
cointegrating relationship. As such, some identification strategy is necessary since the 
proportion of the drift to include in the cointegrating vector is arbitrary. The popular 
software package EViews, for example, identifies the portion belonging in the coin- 
tegrating vector as the amount necessary to force the error-correction term to have a 
sample mean of zero. Nevertheless, as you can see from Figure 6.3, a drift term outside 
of the cointegrating relationship is necessary to capture the effects of a sustained ten- 
dency for the variables to increase (or decrease). Most researchers include drift terms 
if the data match Panels (b) or (c) of Figure 6.3. Otherwise, they include intercepts in 
the cointegrating vector or exclude the deterministic regressors altogether. If you are 
unsure, you can use the methods described in the next section to test whether the drifts 
can be appropriately restricted. Some software packages allow you to include a deter- 
ministic time trend in the model. However, it is best to avoid the use of a trend as an 
explanatory variable unless you have a good reason to include it in the model. Johansen 
(1994) discusses the role of the deterministic regressors in a cointegrating relationship. 

As with the augmented Dickey—Fuller test, the multivariate model can also be 
generalized to allow for a higher-order autoregressive process. Consider 


X= AX] + A2X,—2 Piss +A,X;_p FE (6.53) 
where 


x, = the (n - 1) vector (Xp Xap ace s Xp)” 
g, = an independently and identically distributed n-dimensional vector with 


zero mean and variance matrix },. 
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Equation (6.53) can be put in a more usable form by adding and subtracting 
ApX;—p+1 to the right-hand side to obtain 


X, =A X1 HAX 9 HA3 X3 °° ° + A,_9%)-p42 +(A,-1 +Ap)X_p 41 —A, AX, 541 +E, 


Next, add and subtract (A,_; + Ap)X;-p42 to obtain 


X, = AX 1 + AgX)_72 + A3X 3 Fe (Ap-1 + Ap)AX;-p+2 — AÅpAX;p+1 +E 


Just as in the augmented Dickey—Fuller test developed in Chapter 4, we can con- 
tinue in this fashion to obtain 


p-l 
Ax, = 2x.) + in Axi +E, (6.54) 
i=1 


where z --(1- Sa and z; = — 5 A, 
i=l 


j=i+1 

Again, the key feature to note in (6.54) is rank of the matrix z; the rank of z is 
equal to the number of independent cointegrating vectors. Clearly, if rank(z) = 0, the 
matrix is null and (6.54) is the usual VAR model in first differences. Instead, if z is of 
rank n, the vector process is stationary. In intermediate cases, if rank(z) = 1, there is 
a single cointegrating vector and the expression zx,_; is the error-correction term. For 
other cases in which 1 < rank(z) < n, there are multiple cointegrating vectors. 

As detailed in Appendix 6.1, the number of distinct cointegrating vectors can be 
obtained by checking the significance of the characteristic roots of z. We know that 
the rank of a matrix is equal to the number of its characteristic roots that differ from 
zero. Suppose we obtained the matrix z and ordered the n characteristic roots such that 
A, >A, >: > Àp Ifthe variables in x, are not cointegrated, the rank of z is zero and 
all of these characteristic roots will equal zero. Since In(1) = 0, each of the expressions 
In(1 — A,) will equal zero if the variables are not cointegrated. Similarly, if the rank of z 
is unity, 0 < A, < 1 so the first expression In(1 — /,) will be negative and all the other 
A; = 0 so that In(1 — 42) = In(1 — 43) = --- = In(1 —A,) = 0. 

In practice, we can obtain only estimates of z and its characteristic roots. The test 
for the number of characteristic roots that are insignificantly different from unity can 
be conducted using the following two test statistics: 


Atrace() == > In(1 — Ai) (6.55) 
i=r+1 
Amaxo r+ D) = -T mA — 4,4) (6.56) 


where A; = the estimated values of the characteristic roots (also called eigenvalues) 
obtained from the estimated z matrix 
T = the number of usable observations 


When the appropriate values of r are clear, these statistics are simply referred to as 
Airace and A 


trace max' 
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The first statistic tests the null hypothesis that the number of distinct cointegrating 
vectors is less than or equal to r against a general alternative. From the previous 
discussion, it should be clear that Aja,¢ equals zero when all A; = 0. The further 
the estimated characteristic roots are from zero, the more negative is In(1 — 7) and 
the larger is the A,,,,. statistic. The second statistic tests the null that the number of 
cointegrating vectors is r against the alternative of r + | cointegrating vectors. Again, 
if the estimated value of the characteristic root is close to zero, A,,,, Will be small. 

Critical values of the A,,... and the A,,,, statistics are obtained using the Monte 
Carlo approach. The critical values are reproduced in Table E in the Supplementary 
Manual. The distribution of these statistics depends on two things: 

1. The number of nonstationary components under the null hypothesis 

(L.e.,n — r). 

2. The form of the vector Ay. Use the top portion of Table E if you do not include 
either a constant in the cointegrating vector or a drift term. Use the middle 
portion of the table if you include a drift term Ag. Use the bottom portion of 
the table if you include a constant in the cointegrating vector. 


Using quarterly data for Denmark over the sample period 1974:1 to 1987:3, 
Johansen and Juselius (1990) let the x, vector be represented by 
X, = (m2,, Yp i, ir)’ 
where m2 = log of the real money supply as measured by M2 deflated by a price 
index 
y = log of real income 
il = deposit rate on money representing a direct return on money holding 
iP = bond rate representing the opportunity cost of holding money 
Including a constant in the cointegrating relationship (i.e., augmenting x,_; with a 
constant), they report that the residuals from (6.54) appear to be serially uncorrelated. 


If we round off to two decimal places, the four characteristic roots of the estimated z 
matrix are given in the first column below: 


Amax 5 A trace x 
-T In(i—4,,,) —T ZIn(1—- 4,) 
A, = 0.4332 30.09 49.14 
A, = 0.1776 10.36 19.05 
A, = 0.1128 6.34 8.69 
A, = 0.0434 2.35 2.35 


The second column reports the various A,,,, Statistics as the number of usable 
observations (T = 53) multiplied by In(1 — A, +1). For example, —53 In(1 — 0.0434) = 
2.35 and —53 In(1 — 0.1128) = 6.34. The last column reports the A,,,,. Statistics as the 
summation of the À max Statistics. Simple arithmetic reveals that 8.69 = 2.35 + 6.34 and 
19.05 = 2.35 + 6.34 + 10.36. 
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To test the null hypothesis r = 0 against the general alternative r = 1, 2, 3, or 4, 
use the Ajace Statistic. Since the null hypothesis is r = O and there are four variables 
(i.e., n = 4), the summation in (6.55) runs from 1 to 4. If we sum over the four values, 
the calculated value of Aja.¢ is 49.14. Since Johansen and Juselius (1990) include the 
constant in the cointegrating vector, this calculated value of 49.14 is compared to the 
critical values reported in the bottom portion of Table E. For n — r = 4, the critical 
values Of Arace are 49.65, 53.12, and 60.16 at the 10, 5, and 1% significance levels, 
respectively. Thus, at the 10% level, the restriction is not binding, so that the variables 
are not cointegrated using this test. 

To make a point and to give you practice in using the table, suppose you want 
to test the null hypothesis r < 1 against the alternative r = 2, 3, or 4. Under this null 
hypothesis, the summation in (6.55) runs from 2 to 4 so that the calculated value of 
Airrace 18 19.05. For n — r = 3, the critical values of Arace are 32.00, 34.91, and 41.07 at 
the 10, 5, and 1% significance levels, respectively. The restriction r = 0 or r = 1 is not 
binding. 

In contrast to the A,,,,. Statistic, the /,,,, statistic has a specific alterna- 
tive hypothesis. To test the null hypothesis r = 0 against the specific alternative 
r=1, use equation (6.56). The calculated value of the 4,,,,(0,1) statistic is 
—53 In(1 — 0.4332) = 30.09. For n—r=4, the critical values of Anax are 25.56, 
28.14, 30.32, and 33.24 at the 10, 5, 2.5, and 1% significance levels, respectively. 
Hence, it is possible to reject the null hypothesis r = 0 at the 5% significance level 
(but not the 2.5% level) and conclude that there is only one cointegrating vector (1.e., 
r = 1). Before reading on, you should take a moment to examine the data and convince 
yourself that the null hypothesis r = 1 against the alternative r = 2 cannot be rejected 
at conventional levels. You should find that the calculated value of the /,,,, Statistic 
for r = 1 is 10.36 and that the critical value at the 10% level is 19.77. Hence, there is 
no significant evidence of more than one cointegrating vector. 

The example illustrates the important point that the results of the A,,... and Ajax 
tests can conflict. The A,,,, test has the sharper alternative hypothesis. It is usually 
preferred for trying to pin down the number of cointegrating vectors. 


8. HYPOTHESIS TESTING 


In the Dickey—Fuller tests discussed in Chapter 4, it was important to correctly ascer- 
tain the form of the deterministic regressors. A similar situation applies in the Johansen 
procedure. As you can see in Table E, the critical values of the Apace and Apna, statis- 
tics are smallest without any deterministic regressors and largest with an intercept term 
included in the cointegrating vector. Instead of cavalierly positing the form of Ag, it is 
possible to test restricted forms of the vector. 

One of the most interesting aspects of the Johansen procedure is that it allows for 
testing restricted forms of the cointegrating vector(s). In a money demand study, you 
might want to test restrictions concerning the long-run proportionality between money 
and prices, or the sizes of the income and interest rate elasticities of demand for money. 
In terms of equation (6.1) (1.e., m, = Po + Pp, + Boy; + B37; + e), the restrictions of 
interest are f} = 1, p} > 0, and J} < 0. 
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The key insight to all such hypothesis tests is that if there are r cointegrating vec- 
tors, only these r linear combinations of the variables are stationary. All other linear 
combinations are nonstationary. Thus, suppose you reestimate the model restricting the 
parameters of z. If the restrictions are not binding, you should find that the number of 
cointegrating vectors has not diminished. 

To test for the presence of an intercept in the cointegrating vector as opposed to the 
unrestricted drift Ag, estimate the two forms of the model. Denote the ordered character- 
istic roots of the unrestricted z matrix by Â}, A>, ... , Â, and the characteristic roots of 
the model with the intercept(s) in the cointegrating vector(s) by /*, a es A Suppose 
that the unrestricted form of the model has r nonzero characteristic roots. Asymptoti- 
cally, the statistic 


-T È, [In = 4%) - nd = ÂD] (6.57) 


i=r+1 


has a y? distribution with (n — r) degrees of freedom. 

The intuition behind the test is that all values of In(1 — A*) and In(1 — ii) should be 
equivalent if the restriction is not binding. Hence, small values for the test statistic imply 
that it is permissible to include the intercept in the cointegrating vector. However, the 
likelihood of finding a stationary linear combination of the n variables is greater with the 
intercept in the cointegrating vector than if the intercept is absent from the cointegrating 
vector. Thus, a large value of a ,, [and a corresponding large value of -T In(1 — A al 
implies that the restriction artificially inflates the number of cointegrating vectors. Thus, 
as proven by Johansen (1991), if the test statistic is sufficiently large, it is possible to 
reject the null hypothesis of an intercept in the cointegrating vector(s) and conclude 
that there is a linear trend in the variables. This is precisely the case represented by the 
middle portion of Figure 6.3. 

Johansen and Juselius (1990) test the restriction that their estimated Danish money 
demand function does not have a drift. Since they found only one cointegrating vector 
among m2, y, if, and i?, setn = 4 and r = 1. The calculated value of the aa statistic in 
(6.57) is 1.99. With three degrees of freedom, this is insignificant at conventional levels; 
they conclude that the variables do not have a linear time trend and find it appropriate 
to include the constant in the cointegrating vector. 

In order to test other restrictions on the cointegrating vector, Johansen defines the 
two matrices a and p, both of dimension (n - r) where r is the rank of z. The properties 
of a and f are such that 


x= ap" 


Note that J is the matrix of cointegrating parameters and a is the matrix of weights 
with which each cointegrating vector enters the n equations of the VAR. In a sense, 
a can be viewed as the matrix of the speed of adjustment parameters. Due to the 
cross-equation restrictions, it is not possible to estimate a and f using OLS.* How- 
ever, using maximum likelihood estimation, it is possible to (1) estimate (6.54) as an 
error-correction model, (2) determine the rank of z, (3) use the r most significant coin- 
tegrating vectors to form p’, and (4) select a such that z = af’. Question 5 at the end 
of this chapter asks you to find several such a and p’ matrices. 
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It is easy to understand the process in the case of a single cointegrating vector. 
Given that rank(z) = 1, the rows of z are all linear multiples of each other. Hence, the 
equations in (6.54) have the form 

AX = Hyp Xp H AX Hie E TX to FE 


AX = S2(T1 1X11 F T12X21 He + Ay yXy—-1) +++ + + En 


AXnt = Sq(H yy X p71 F A1221 He E Ann) Fe Ene 


where the s; are scalars and, for notational simplicity, the matrices z;Ax,_; have not 
been written out. 


Now define a; = s,a,, and f; = ,;/2,, so that each equation can be written as 


Axi, = Oj, + BX Hee + Bx te Fee GH Lm 
or in matrix form, 
p-1 
Ax, = È, Axi t af’). + E, (6.58) 
i=l 
where the single cointegrating vector is 6 = (1, 5, Bz, ... , B,)’ and the speed of adjust- 
ment parameters are given by a@ = (@),d, ... ,a@,)/. 


Once a and p’ are determined, testing various restrictions on «œ and J’ is straight- 
forward if you remember the fundamental point that if there are r cointegrating vectors, 
only these r linear combinations of the variables are stationary. Thus, the test statistics 
involve comparing the number of cointegrating vectors under the null and alternative 
hypotheses. Again, let A 15 Ans nid În and a A, ses i denote the ordered character- 
istic roots of the unrestricted and restricted models, respectively. To test restrictions on 
p, form the test statistic 


TY [Incl - 4%) - nd = ÂI (6.59) 
i=1 


Asymptotically, this statistic has a y? distribution with degrees of freedom equal 
to the number of restrictions placed on J. Small values of A relative to A (for i <r) 
imply a reduced number of cointegrating vectors. Hence, the restriction embedded in 
the null hypothesis is binding if the calculated value of the test statistic exceeds that 
in a y? table. For example, Johansen and Juselius test the restriction that money and 
income move proportionally. Their estimated long-run equilibrium relationship is 


m2, = 1.03y, — 5.211 + 4.2217 + 6.06 


They restrict the coefficient of income to be unity and find the restricted values of 
the A* to be such that 


Fi Tin(i—A;) 
i=l 0.433 —30.04 
i=2 0.172 —10.01 
i=3 0.044 -2.36 
i=4 0.006 —0.32 
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Given that the unrestricted model has r = 1 and —T In(1 — Ad) = 30.09, (6.59) 
becomes —30.04 + 30.09 = 0.05. Since there is only 1 restriction imposed on f, 
the test statistic has a y? distribution with 1 degree of freedom. A y? table indi- 
cates that 0.05 is not significant; hence, they conclude that the restriction is not 
binding. 

Restrictions on a can be tested in the same way. The procedure is to restrict æ and 
compare the r most significant characteristic roots for the restricted and unrestricted 
models using (6.59). If the calculated value of (6.59) exceeds that from a x table, with 
degrees of freedom equal to the number of restrictions placed on a, the restrictions can 
be rejected. For example, Johansen and Juselius (1990) test the restriction that only 
money demand (i.e., m2,) responds to the deviation from long-run equilibrium. For- 
mally, they test the restriction that a, = a; = a4 = 0. Restricting the three values of a; 
to equal zero, they find the largest characteristic root in the restricted model is such that 
T Indi — As) = —23.42. Since the unrestricted model is such that TIn(1 — A = 
—30.09, equation (6.59) becomes —23.42 — (—30.09) = 7.67. The x? statistic with 
three degrees of freedom is 7.81 at the 5% significance level. Hence, they find mild 
support for the hypothesis that the restriction is not binding. 

If there is a single cointegrating vector, the Engle—Granger and Johansen methods 
have the same asymptotic distribution. If it can be determined that only one cointegrat- 
ing vector exists, it is also common to rely on the estimated error-correction model to 
test restrictions on a. If r = 1, and a single value of a@ is being tested, the usual t-statistic 
is asymptotically equivalent to the Johansen test. 


Lag Length and Causality Tests 


The simplest way to understand lag length tests is to consider the system in the form 
of (6.54) 

p-1 

Ax, = AX] + >, TAX i + €; 
i=1 
Regardless of the rank of z, all of the Ax,_; are stationary variables. Hence, we 

can use Rule 1 of Sims, Stock, and Watson (1990). Recall that the rule implies that 
the coefficients of interest on zero-mean stationary variables can be tested using a nor- 
mal distribution. Since lag length depends solely on the values of the various z;, a y? 
distribution is appropriate to test any restriction concerning lag length. As in the case 
of any VAR, let 2, and Ł, be the variance/covariance matrices of the unrestricted and 
restricted systems, respectively. As in Chapter 5, let c denote the maximum number of 
regressors contained in the longest equation. The test statistic 


(T T c)(log|ž,l| z logļ£,„1) 


can be compared to a y? distribution with degrees of freedom equal to the number of 
restrictions in the system. Alternatively, you can use the multivariate AIC or SBC to 
determine the lag length. If you want to test the lag lengths for a single equation, an 
F-test is appropriate. 
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The rule also means that you cannot perform Granger causality tests in a cointe- 
grated system using a standard F-test. First, suppose that rank (r) = O so that 
p-1 
Ax, = £X TAX i + €; 
i=l 
As such, Granger causality involves only stationary variables. Yet, this was pre- 
cisely the case discussed in Chapter 5 when the variables in a VAR are not cointegrated. 
Hence, Granger causality tests can be conducted using a standard F distribution. How- 
ever, if the variables are cointegrated, a Granger causality test involves the coefficients 
of z. Since these coefficients multiply nonstationary variables, it is not appropriate to 
use an F-statistic to test for Granger causality. After all, if rank(z) Æ 0, it is impossible 
to write the restrictions of the test as restrictions on a set of /(0) variables. Block exo- 
geneity tests are also ruled out too. If w, is cointegrated with y, or z,, you cannot use a 
standard x° test to determine whether w, belongs in the equations for y, and z,. 


To Difference or Not to Difference 


We have reached a point where it is possible to address the issue of differencing the 
nonstationary variables in an unrestricted VAR. There is no question that differencing 
leads to a misspecification error if the variables are cointegrated. Suppose that the actual 
data-generating process is given by the cointegrated system of (6.54) but you estimate 
the following VAR in first differences: 

p-1l 

Ax, = > TAX i + €; 

i=] 

The system is misspecified since it excludes the long-run equilibrium relationships 
among the variables that are contained in zx,_,. Given the misspecification error, all of 
the coefficient estimates, t-tests, F-tests, tests of cross-equation restrictions, impulse 
responses and variance decompositions are not representative of the true process. 
Hence, there is a substantial penalty to pay if you estimate a VAR in first differences 
when the data are actually cointegrated; differencing “throws away” information 
contained in the cointegrating relationship(s). 

Why not simply estimate all VARs in levels? The answer is that it is preferable 
to use the first differences if the /(1) variables are not cointegrated. There are three 
consequences if the /(1) variables are not cointegrated and you estimate the VAR in 
levels: 


1. Tests lose power because you estimate n? more parameters (one extra lag of 
each variable in each equation). 


2. Fora VAR in levels, tests for Granger causality conducted on the /(1) vari- 
ables do not have a standard F distribution. If you use first differences, you 
can use the standard F distribution to test for Granger causality. 


3. When the VAR has /(1) variables, the impulse responses at long forecast 
horizons are inconsistent estimates of the true responses. Since the impulse 
responses need not decay, any imprecision in the coefficient estimates will 
have a permanent effect on the impulse responses. If the VAR is estimated 
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in first differences, the impulse responses decay to zero and so the estimated 
responses are consistent. 


The suggestion is that it is important to properly determine whether the /(1) vari- 
ables are cointegrated. You can perform lag length tests regardless of whether the 
variables are cointegrated. As such, the suggested methodology is to estimate an unre- 
stricted VAR. Most researchers would begin with a lag length of approximately T!⁄. 
You may want to alter the number of lags to correspond to the seasonal frequency of 
the data. For example, with 100 observations of two variables using quarterly data, you 
might want to begin with 8 lags even though T!⁄ is approximately five. Select the 
most appropriate lag length and then perform a cointegration test. If the variables are 
not cointegrated, estimate the system in first differences. If the variables are cointe- 
grated, you can work with the error-correction model. Since the error-correction term 
and all values of Ax,_; are stationary, you can conduct inference on any variable (except 
those appearing within the cointegrating vectors) using the usual test statistics. Impulse 
responses and variance decompositions will yield consistent estimates of the actual 
values. 


Tests on Multiple Cointegrating Vectors 


If the rank of z exceeds one, it is not straightforward to interpret the cointegrating vec- 
tors. When there are multiple cointegrating vectors, any linear combination of these 
vectors is also a cointegrating vector. Fortunately, it is often possible to identify sep- 
arate behavioral relationships by appropriately restricting the individual cointegrating 
vectors. The only complication is that you need to be clear about the number of restric- 
tions you impose on the system. It is important to note that if there are r cointegration 
relationships in an n-variable system, there exists a cointegrating vector for each sub- 
set of (n—r + 1) variables. For example, if there are two cointegrating vectors in a 
three-variable system, there is a cointegrating vector for each bilateral pair of the vari- 
ables (2 = n — r + 1). To demonstrate the point, let x, = (X1; X2 X31 X4)" and suppose 
there are two cointegrating vectors for these four variables. If we normalize each vector 
with respect to x,,, we can write the two independent relationship in f’x, = 0 as 


Xir 
f -n -Piz | Xp, =f 
1 =a =b —Bog} | X3: 

X4t 


Consider the 2 - n matrix p’ consisting of the cointegrating parameters. Subtract row 1 
from row 2 to obtain 


0 =bn +P bz +P -Pa + Big 


Now, renormalize row 2 by dividing each of its elements by ($12 — p22) to obtain 


k =pi2 —Bi3 E 
0 1 -b3 -pz 


È =P —Bi3 —Bi4 
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where —f3, = (813 — By3)/(Bi2 — Boz) and —B3, = (P14 — Bo4)/(Bi2 — P22). Hence, x27, 
X3;,, and x4, are cointegrated such that x», = 33X3; + B},%4;. Similarly, add f} times row 


2 to row | to obtain 
1 0 =f —£* 
13 14 
0 1 -3 -Pa 
where By; = Bij + Bi2B3, 

Thus, x; , x3, and x4 are cointegrated such that x), = /',x3, + Bj 4%4;. Since the label- 
ing of the variables is irrelevant, it follows that there exists a cointegrating vector for 
each subset of three variables. More generally, #’ will be an r - n matrix of cointegrat- 
ing parameters, and each subset of n — r+ | variables will be cointegrated. From the 
preceding discussion, it should be clear that standard row and column operations on 2’ 
do not entail restrictions on the cointegrating vectors. Such operations merely result in 
additional cointegrating vectors that are linear combinations of the original vectors. 


EXAMPLE 1: VARIABLE EXCLUSION WITHIN AN EQUATION With 
multiple cointegrating vectors, you cannot test whether any one particular J; = 0 since 
this assumption does not restrict the cointegrating space. In the general case where 
p' is an r-n matrix, a testable exclusion restriction entails the exclusion of r or more 
variables from a cointegrating vector. Hence, excluding r variables from a cointegrat- 
ing vector entails only one restriction. If the sample value of the y? statistic with one 
degree of freedom (since there is only one restriction involved) exceeds a critical value, 
reject the null hypothesis that this set of variables contains a cointegrating relationship. 


EXAMPLE 2: VARIABLE EXCLUSION ACROSS EQUATIONS Next, sup- 
pose that you want to test whether x4, can be excluded from the set of cointegrating rela- 
tionships. The restriction 6), = P24 = 0 entails only one restriction on the cointegrating 
space. In the general case where p’ is an r - n matrix, the test Bij = By = = 6, =9 
still involves only one restriction. This follows since x; can be eliminated from r — 1 
equations using simple row and column operations. 


EXAMPLE 3: CONDITIONAL RESTRICTIONS It is also possible to restrict 
one cointegrating vector conditional on the values of all other cointegrating vectors. 
For example, you might want to determine if (1,0, 2,3, p24)" is a cointegrating vector 
for the given normalized values of f,,, P13, and p44. Thus, you fix the values of p43, 
#3, and f,4 and determine whether you can exclude x5, from the second vector. Cutler, 
Davis, and Smith (1999) consider the identification issue in considerable detail. They 
examine the following four behavioral relationships in a seven variable system: 


m, = dy + dy, + dir, + d3p, + ey, 
Cy = Ag + Gy; + Agr, + Cr, 
L = bo + biy, + bor, + ez; 

im, = 80 + 819; + Bat + Cat 
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where m,=log of nominal money holdings 

y, = log of real income 
r, = real interest rate 
c, = log of real consumption 
i, = log of real investment 
P, = log of the price level 

im, = log of real imports 

eip Crp» C37, and e4, = stationary error terms 


The first equation is the money demand equation. The next three equations are a 
simple consumption function, an investment function, and an import demand function, 
respectively. Consumption, investment, and imports are each assumed to be functions 
of only income and the interest rate. The issue is to determine whether it is possible 
to identify these four equations from a seven-variable system. Toward this end, they 
obtained estimates of a 7 X 7 z matrix over a number of sample periods. There were 
at least four cointegrating vectors in every case considered. Over the entire sample, 
196002 to 199004, Cutler, Davis, and Smith (1999) found that they could not reject 
the restrictions at conventional significance levels (the prob-value was 16%). 


The Test in the Presence of K2) Variables 


It is also possible to test for multicointegration using Johansen’s methodology. Consider 
the VAR system: 
p-2 
A’x, = zx, 1 +TAx,_, + > TA X i +E (6.60) 
i=l 
The issue of multicointegration concerns the ranks of both z and I. In principle, 
it is possible to consider all possible orders of cointegration for the variables in the 
system. However, to illustrate the procedure, it is useful to begin with a three-variable 
system consisting of the three /(2) variables x,,, x>,, and x3, that are multicointegrated 
such that 


Ty Xqp + Hy gQXqy + 11 3X3, + D11 Axi + PpAx, + 0) 3A%3, = 0 


Let r denote the rank of z and rı denote the rank of T so that (6.60) is such that 
r= r; = 1. Clearly, if r = 0, multicointegration fails since there is no linear combi- 
nation of the three /(2) variables that forms an equilibrium relationship. If r = 1 and 
rı = 0, the equilibrium relationship has the form 7, x), + 117%, + 113%3, = 0. As such, 
A?x, = mx,_, + 1(0) variables so that 1.x), + 12X2; + 113X3, is necessarily a station- 
ary relationship—the variables are C/(2, 2). All of this may seem straightforward, but 
there is a complicating factor when the ranks of z and T have to be estimated. To 
illustrate the point, suppose that the /(2) variables are cointegrated such that 


My jX 1p + TX + 113X3, ~ (1) 


where ~ J(d) indicates the order of integration. 
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If you take the first difference, it follows that 74; Ax), + 2) 2AX>, + 713A%3, is [(0). 
You should be able to figure out the problem. For any cointegrating vector in z, it is 
possible to estimate an identical cointegration vector for the first differences of the 
variables. Yet a linear combination of the two relationships is not stationary. Consider 
the result obtained by subtracting the /(0) relationship from the /(1) relationship: 


Ty Xp + TX + 11 3X3y — Hy AX, — WyyAXy, — T13 AX, 
= My Xyp—y F 12X21 t M1 3X3-1 


Since 711X11 + T12X2-1 + 213%X3;_1 is JC), all that has been done is to change the 
time subscript for the variables in the cointegrating relationship. The point is that it is 
necessary to find cointegrating vectors in I’ that are not linear combinations of those 
in m. 

If we take the more general case considered by Johansen (1995), let rank(z) = r 
and let s denote the number of cointegrating vectors in I that are orthogonal to those in 
a. In an n-variable system such that some of the variables are /(2), you should be able 
to verify that: 


1. Ifr = 0, there is no relationship among the variables that is stationary. 


2. Ina system with n variables, if r + s = n — 1, there is a unique multicointe- 
grating vector. The number of /(2) stochastic trends in an n-variable system is 
given byn—r-—s. 

3. The value of s must be such that s < n — r. For the analysis of /(2) variables to 
be appropriate, the values of r and s must be such thats + r < n. Ifs =n-r, 
then x, contains no /(2) variables. 


Johansen’s cointegration test with /(2) variables is actually a two-step procedure. 
In the first step, you estimate a model as in (6.60) to determine the rank of z. Determine 
the value of r using the A,,,,. and A,,,, Statistics in the usual way. In the second step, 
you determine the value of s conditional on the value of r.> Let the null hypothesis be 
S = Sp and consider 


n 
O*,=-T 2 In(1 — Â») (6.61) 
i=sọ+1 

Hence, Q% , is constructed in the same fashion as a A,,,¢ statistic. The principal 
differences are that you test the rank of I’ conditional on the value of r and that you 
obtain the number of cointegrating vectors orthogonal to those in z. As such, the critical 
values needed to determine the value of s have to be modified. Given the value of r, 
if the sample value of Q7 , exceeds the critical value calculated by Johansen, reject the 
null hypothesis s = sọ in favor of the alternative s > sọ. For r = 1, the critical values at 

the 10, 5, and 1% significance levels are 


Critical Values for Q; 


s=0 s= 
10% 31.88 17.79 
5% 34.80 19.99 


1% 40.84 24.74 
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For example, let r = 1 and suppose that the sample value of Q} , is found to be 
35.00. As such, the null hypothesis s = 0 can be rejected at the 5% significance level. 


9. ILLUSTRATING THE JOHANSEN 
METHODOLOGY 


An interesting way to illustrate the Johansen methodology is to use exactly the same 
data shown in Figure 6.2. Recall that the data is contained in the file COINT6.XLS. 
Although the Engle—Granger technique did find that the simulated data were cointe- 
grated, a comparison of the two procedures is useful. Use the following four steps when 
implementing the Johansen procedure. 


STEP 1: It is good practice to pretest all variables to assess their order of integra- 
tion. Plot the data to see if a linear time trend is likely to be present in the 
data-generating process. In most instances you will have variables that are 
integrated of the same order. In other cases, you can check for multicointe- 
gration. 

The results of the test can be quite sensitive to the lag length, so it is 
important to be careful. The most common procedure is to estimate a vector 
autoregression using the undifferenced data. Then use the same lag-length 
tests as in a traditional VAR. Begin with the longest lag length deemed rea- 
sonable and test whether it can be shortened. For example, if we want to test 
whether lags 2 through 4 are important, we can estimate the following two 
VARs: 


X, = Ao + AX, 1 HAX, + A3%j_3 FAG _4 + eir 
X, = Ao +AÅ1X 1 + ez 


where x, =the (n » 1) vector of variables 
Ag = (n- 1) matrix of intercept terms 
A; = (n-n) matrices of coefficients 


e, and e,, =(n- 1) vectors of error terms. 


Estimate the first system with four lags of each variable in each equation 
and call the variance/covariance matrix of residuals X,. Now estimate the 
second equation using only one lag of each variable in each equation and call 
the variance/covariance matrix of residuals X,. Even though we are work- 
ing with nonstationary variables, we can perform lag length tests using the 
likelihood ratio test statistic recommended by Sims (1980): 


(T — c)(og|z,| — log|z,4)) 


where T = number of observations 
c = number of parameters in the unrestricted system 


log|=,;| = natural logarithm of the determinant of £}; 
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STEP 2: 


Following Sims, use the y? distribution with degrees of freedom equal 
to the number of coefficient restrictions. Since each A; has n? coefficients, 
constraining A, = A; = A, = 0 entails 3n? restrictions. Alternatively, you 
can select lag length p using the multivariate generalizations of the AIC 
or SBC. In the model at hand, you should find that the general-to-specific 
method and the AIC select a lag length of 2 whereas the SBC selects a lag 
length of 1. 


Estimate the model and determine the rank of z. Many time-series statis- 
tical software packages contain a routine to estimate the model. Here, it 
suffices to say that OLS is not appropriate because it is necessary to impose 
cross-equation restrictions on the z matrix. In most circumstances, you may 
choose to estimate the model in three forms: (1) with all elements of Ag set 
equal to zero, (2) with a drift, or (3) with a constant term in the cointegrating 
vector. 

For example, we can use the simulated data shown in Figure 6.2 so 
that x, = (Y; Z,,w,)’. If we pretend that we do not know the form of the 
data-generating process, we might want to include an intercept term in the 
cointegrating vector(s). As we saw in the last section, it is possible to test for 
the presence of the intercept. If we follow the general-to-specific method and 
use a lag length of 2, the estimated model has the form 


Ax, = Ap + TX, + 2, AX,_) tE, (6.62) 


where Avy was constrained so as to force the intercept to appear in the cointe- 
grating vector. 

As always, carefully analyze the properties of the residuals of the esti- 
mated model. Any evidence that the errors are not white noise usually means 
that lag lengths are too short. Figure 6.4 shows deviations of y, from the 
long-run relationship (u, = —0.01331 — y, — 1.0350z, + 1.0162w,) and one 
of the error sequences (i.e., the {€,,} sequence that equals the residuals 
from the y, equation in (6.62)). Both sequences conform to their theoret- 
ical properties in that the residuals from the long-run equilibrium appear 
to be stationary and the estimated values of the {€,,} series approximate a 
white-noise process. 

The estimated values of the characteristic roots of the z matrix in (6.62) 
are 


A, = 0.32600; A, = 0.14032; and A3 = 0.033168 


Since T = 98 (100 observations less the two lost as a result of using 2 
lags), the calculated values of A max and Atrace for the various possible values 
of r are reported in the center column of Table 6.6. 

Consider the hypothesis that the variables are not cointegrated (so that 
the rank z = 0). Depending on the alternative hypothesis, we have a choice 
of two possible test statistics. If we are interested in the hypothesis that the 
variables are not cointegrated (r = 0) against the alternative of one or more 
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FIGURE 6.4 Long-Run and Short-Run Errors 


Table 6.6 The d,,,, and Atrace Tests 
Alternative 95% Critical 90% Critical 
Null Hypothesis Hypothesis Value Value 
trace tests Atrace Value 
r=0 r>0 56.786 34.91 32.00 
r<1 r>1 18.123 19.96 17.85 
r<2 r>2 3.306 9.24 7.52 
Amax tests Amax Value 
r=0 r=1 38.663 22.00 19.77 
PS ř=2 14.817 15.67 13.75 
r=2 r=3 3.306 9.24 7.52 


cointegrating vectors (r > 0), we can calculate the Arace(0) statistic: 


Atrace(O) = -T [In(1 — 41) + In(1 — 43) + In(1 — 43)] 
= —98 [In(1 — 0.326) + In (1 — 0.14032) + In(1 — 0.033168 )] 
= 56.786 


Since 56.786 exceeds the 5% critical value of the Aj,a¢¢ Statistic (in the 
bottom panel of Table E, the critical value is 34.91), it is possible to reject the 
null hypothesis of no cointegrating vectors and accept the alternative of one 
or more cointegrating vectors. Next, we can use the Aj,,,¢(1) statistic to test 
the null of r < 1 against the alternative of two or three cointegrating vectors. 
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STEP 3: 


In this case, the Atrace 
Arace(1) = -T [In — Az) + Ind. — A3)] 
= —98 [In (1 — 0.14032) + In(1 — 0.033168)] 
= 18.123 


(1) statistic is 


Since 18.123 is less than the 5% critical value of 19.96, we cannot reject 
the null hypothesis at this significance level. However, 18.123 does exceed 
the 10% critical value of 17.85; some researchers might reject the null and 
accept the alternative of two or three cointegrating vectors. The Arace(2) 
statistic indicates no more than two cointegrating vectors at the 10% sig- 
nificance level. 

The Amax Statistic does not help to clarify the issue. The null hypothe- 
sis of no cointegrating vectors (r = 0) against the specific alternative r = 1 
is clearly rejected. The calculated value /,,,,(0, 1) = —98 In(1 — 0.326) = 
38.663 exceeds the 5% critical value of 22.00. Note that the test of the null 
hypothesis r = 1 against the specific alternative r = 2 cannot be rejected at 
the 5%, but can be rejected at the 10%, significance level. The calculated 
value of A max(1; 2) is —98 In(1 — 0.14032) = 14.817, whereas the critical 
values at the 5 and 10% significance levels are 15.67 and 13.75, respectively. 
Even though the actual data-generating process contains only one cointe- 
grating vector, the realizations are such that researchers willing to use the 
10% significance level would incorrectly conclude that there are two coin- 
tegrating vectors. Failing to reject an incorrect null hypothesis is always an 
inherent danger of using wide confidence intervals. 

Analyze the normalized cointegrating vector(s) and speed of adjust- 
ment coefficients. If we select r = 1, the estimated cointegrating vector 


(Bo; Bi, Po, p3) is 
p = (0.00553, 0.41532, 0.42988, —0.42207) 


If we normalize with respect to p4, the normalized cointegrating vector 
and the speed of adjustment parameters are 


p = (—0.01331, —1.0000, — 1.0350, 1.0162) 


a, = 0.54627 
a, = 0.16578 
ar, = 0.21895 


Recall that the data were constructed imposing the long-run relation- 
ship: w, = y, + z; hence, the estimated coefficients of the normalized p 
vector are close to their theoretical values of (0, —1, —1, 1). Consider the 
following tests: 


1. The test that Jọ = 0 entails one restriction on one cointegrating vector; 
hence, the likelihood ratio test has a y? distribution with one degree of 
freedom. The calculated value of y? = 0.011234 is not significant at con- 
ventional levels. Hence, we cannot reject the null hypothesis that pọ = 0. 
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Thus, it is possible to use the form of the model in which there is neither 
a drift nor an intercept in the cointegrating vector. Thus, to clarify the 
issue concerning the number of cointegrating vectors, it would be wise 
to reestimate the model excluding the constant from the cointegrating 
vector. 


2. To restrict the normalized cointegrating vector such that p, = —1 and 
p3 = 1 entails two restrictions on one cointegrating vector; hence, the like- 
lihood ratio test has a y? distribution with two degrees of freedom. The 
calculated value of y? = 0.55350 is not significant at conventional levels. 
Hence, we cannot reject the null hypothesis that p, = —1 and p} = 1. 


3. To test the joint restriction 6 = (0, —1, —1, 1) entails the three restrictions 
By = 0, By = —1, and p3 = 1. The calculated value of y with three 
degrees of freedom is 1.8128 so that the significance level is 0.612. 
Hence, we cannot reject the null hypothesis that the cointegrating vector 
is (0, —1,-1, 1). 

STEP 4: Finally, innovation accounting and causality tests on the error-correction 
model of (6.62) could help to identify a structural model and determine whe- 
ther the estimated model appears to be reasonable. Since the simulated data 
have no economic meaning, innovation accounting is not performed here. 


10. ERROR-CORRECTION AND ADL TESTS 


In the Engle—Granger method, it is possible to estimate the long-run equilibrium rela- 
tionship from a regression of z, on y, or from a regression of y, on z,. In the Johansen 
method, all variables are treated symmetrically. Hence, either method can be used in 
circumstances when you do not want to explicitly specify a “dependent” variable and 
a set of “independent” variables. This can be especially advantageous if the variables 
are jointly determined and you are not sure how to disentangle the interdependence 
among them. For example, in a test for purchasing power parity, it is likely that the 
exchange rate and the two price levels all have strong effects on each other. In other 
circumstances, the selection of a dependent variable and the set of independent vari- 
ables might be clear. As discussed in this section, there are potential benefits to be 
had by incorporating such information into a cointegration model. The starting point is 
to be precise about the econometric meaning of the term “exogenous.” To begin with 
the simplest case, suppose that y, and z, are cointegrated of order (1, 1) and that the 
error-correcting model (ECM) is represented by 


Ay, = 07-1 — Pzr) + er (6.63) 
Az, = 49(Y;_1 = Pzr) + ez (6.64) 


Notice that (6.63) and (6.64) are in reduced form and not in structural form. In 
order to allow for the possibility that the error terms are correlated, we can let the 
relationship between the error terms and the structural shocks be given by 


|< = [e se [e] 
er Cor 22) [Ex 
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where £ and £, are the structural innovations in Ay, and Az,, and the c,; are coeffi- 
cients. As in the discussion of structural VARs in Section 10 of Chapter 5, the structural 
shocks are uncorrelated in that Ee „£, = 0. Even though Ee,,e., = 0, e1; and ep, will 
generally be correlated if c} and/or cz; differ from zero. 

For now, suppose that the values of the c;; are unknown. Nevertheless, it is always 
possible to construct an orthogonalization between the two errors such that 


C1, = pe + Vy (6.65) 


where p is the regression coefficient of e}, on e,, and v, is the innovation in e,, that is 
not correlated with e,,. If we substitute (6.64) and (6.65) into (6.63), we obtain 


Ay, = 4 Or- — B21) + pex + v; 
= a(Y,_4 = Pzt) + p[ Az, — Yr] a 6z,_1)] +v, 
= (&; — pay) r1 — B%-1) + PAZ, + v, 
Now, if we let a = a, — pay, we can write 


Ay, = a(y,_; — B%_,) + pAz, +v, (6.66) 


In general, it is not appropriate to estimate (6.66) directly since it contains the 
jointly determined variables Ay, and Az,. The general problem is that Az, will be cor- 
related with the error term v, so that there is a simultaneity problem. As such, OLS 
cannot be used to recover meaningful estimates of the parameters of the model. Even if 
the simultaneity problem is rectified, there is an identification problem since a, and a, 
cannot be separately identified from the OLS estimate of a. However, it is possible to 
specify conditions such that the simultaneity and identification problems disappear and 
that OLS is an efficient estimation and testing strategy. As will be shown below, the two 
conditions are a, = 0 (so that z, does not respond to the discrepancy from the long-run 
equilibrium relationship) and c3; = 0 (so that z, does not respond to €,,). Thus, the two 
required assumptions are that z, is weakly exogenous and causally prior to y,. 


Cointegration with Weak Exogeneity 


Following Engle, Hendry, and Richard (1983), a variable x; is weakly exogenous for 
the parameter set P if the marginal distribution of x; contains no useful information 
for conducting inference on P. In a cointegrated system, if a variable does not respond 
to the discrepancy from the long-run equilibrium relationship, it is weakly exogenous. 
Hence, if the speed of adjustment parameter a; is zero, the variable in question is weakly 
exogenous. In the example used by Johansen and Juselius (1990), it might be possible 
to argue that real income should be weakly exogenous. After all, in a full-employment 
environment, discrepancies between long-run money demand and supply would not be 
expected to change real income. For our purposes, the practical importance is that a 
weakly exogeneous variable does not experience the type of feedback that necessitates 
the use of a VAR. 

To explain, suppose that you try to estimate an equation like (6.66) using OLS. You 
could use a two-step method, such as that employed in the Engle—Granger procedure, 
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and regress y, on z, to obtain an estimate of # and then form the variable y,_,; — Pz,_,- 
However, at this point in time, the preference in the literature is to estimate the unre- 
stricted equation 

Ay, = Bryy_1 + BaZe- + B3Az, +v, (6.67) 


where from (6.66) the estimated coefficients are such that f; = a, — pay, By = (a, — 
pay)B and p3 = p. 

Since the coefficients of (6.67) are unrestricted, this form of the model is often 
called an autoregressive distributed lag to distinguish it from an ECM in the form of 
(6.66). Notice that the value of œ, appears in the estimates for p4 and p}. However, if 
z; is weakly exogenous (i.e., if a, = 0), your coefficient estimates should be such that 
Pı =a), Pa = a, p and p3 = p. Thus, you can identify @,, p, and p from f,, p2, and p3 
since the OLS estimation of (6.67) is equivalent to estimating the equation 


Ay, = 0 Y,_1 — &Bz,_; + PAZ +, (6.68) 


Although weak exogeneity allows the model to be identified, there is still the issue 
of properly testing (6.68) for cointegration. Since {y,} and {z,} are (1), the test statis- 
tics of the null hypothesis J4 = 0 and p, = 0 in (6.67) are nonstandard and need to be 
tabulated. The usual way to test for cointegration is to use the f-statistic for the null 
hypothesis £, = 0 in (6.67).° After all, if A} = 0, there is no error-correction so that 
y, is not cointegrated with z,. Table F in the Supplementary Manual, uses the work 
of Ericsson and MacKinnon (2002) to calculate the appropriate critical values neces- 
sary to determine whether J; < 0. The critical values depend on the number of /(1) 
regressors in the model (denoted by k), the adjusted sample size T“, and the form of 
the deterministic regressors. For example, if you have an adjusted sample size with 100 
observations and estimate a model with an intercept (d = 1) and two weakly exogenous 
variables (k = 3), Table F indicates that the appropriate critical values to test the null 
hypothesis f4 = 0 are —4.181, —3.538, and —3.205 at the 1, 5, and 10% significance 
levels, respectively. 

If you compare (6.67) with (6.63), you can see the benefit of employing weak 
exogeneity. Since the two representations are equivalent, e, is composed of Az, and v,. 
Since (6.67) will have a smaller variance than the error term in (6.63), the coefficients 
of (6.67) can be estimated with more precision than the coefficient of (6.63). A second 
benefit ascribed to estimating such a model is that the coefficients of y,_, and z,_, 
are unrestricted. As such, the short-run dynamics for Ay, are not dictated by long-run 
equilibrium relationship y,_; = #z,_,. In the Engle—Granger and Johansen approaches, 
the so-called Common Factor Restriction forces the short-run changes in Ay, to be a 
constant proportion of the previous period’s deviation from long-run equilibrium. 


Inference on the Cointegrating Vector 


Suppose you assume that weak exogeneity holds and conclude that the variables are 
cointegrated (so that a, < 0 and a, = 0). As such, it is possible to write (6.64) and 
(6.67) as 

Ay, = 0 Or-1 — Pzr) + PAZ, +y, (6.69) 
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and 


Az, = ex (6.70) 


Now the question becomes: Can you conduct inference on œ; and p in (6.69) using 
standard t-tests and F-tests? The answer, quite possibly, is yes! Since all variables in 
(6.69) are stationary, we are really operating within a standard OLS regression frame- 
work. A simultaneity problem exists if the regressors appearing in (6.69) depend on 
the error term v,. Clearly, the /(0) variable y,_; — 6z,_, is pre-determined so that there 
is no need to worry about the influence of v, on the error-correction term. Hence, the 
key issue concerns the contemporaneous relationship between Ay, and Az,. If Az, is 
unaffected by innovations in Ay,, it is appropriate to conduct inference on (6.69) using 
a standard t-tests and F-tests. 

Recall that the particular orthogonalization used in (6.65) is such that e}, = pes, + 
v, where e, and v, are uncorrelated. This is actually a Choleski decomposition in that 
Az, does not respond to innovations in Ay, but Ay, responds to innovations in Az,. It 
should be clear that actual error structure has this Choleski form only if c); = 0. In 
other words, if cy; = 0, (6.65) is equivalent to e}; = pez, + €,, and ez, = £4. Given that 
Az, = e>, does not depend on €,,, there is no feedback from Ay, to Az, so that it is 
possible to use standard inference on (6.68) or (6.69). 

Thus, testing restrictions on q; is straightforward since it is the coefficient on the 
(0) variable (y,_; — Pz,_,). As such, given that a, # 0, it is appropriate to form confi- 
dence intervals on a, using a standard t-distribution. Similarly, given that 6 4 0, p can 
be written as the coefficient on the /(0) variable (y,_, /f — a,z,_,). Inference on p can 
also be conducted using a f-distribution. Finally, note that p is the coefficient on the 
stationary variable Az,. Hence, it is appropriate to construct confidence intervals for p 
using a f-distribution. 

It is straightforward to generalize these results. Since z, can actually be a vector of 
(1) variables, you can estimate (6.67) for y, and a set of weakly exogenous variables z,. 
For example, with two weakly exogenous variables, z,, and z,,, the error-correction 
model generalizes to 


Ay, = Or Z ByZtr—-1 — BoZ2¢-1) + B3 AZ, + ByAZy, + vy 
so that you estimate a model of the form 
Ay, = Yri + DiZi + ba + BAZ + By AZ, + v,- 


where b; = —$,/a, and b, = —f,/a,. 

To test for cointegration use the t-statistic for the null hypothesis a, = 0. Since 
you have three /(1) variables in the model, obtain the critical values from Table F such 
that k = 3. Of course, if we start from a higher order process, additional lags of Ay,_,, 
AZs; and Azz; should be added to the equation. As in the two-variable case, you 
need to assume that Ay, has no contemporaneous effects on any values of the Az;,. 
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11. COMPARING THE THREE METHODS 


In this section, we compare the Engle—Granger, Johansen and ADL tests for cointe- 
gration using the three-month Treasury bill and 10-year interest rates using the data in 
QUARTERLY.XLS. Although we know that the spread acts as a stationary variable, 
the point of this section is to illustrate the use of the three testing methodologies. Since 
we have already verified that the individual rates act as (1) process we can skip the 
preliminary step of pretesting for unit roots. To keep the discussion on point, reason- 
able leg lengths for each test are simply reported. You can verify them in the exercises 
at the end of the chapter. 


The Engle-Granger Methodology 


Given that each rate acts as a unit-root process, we can begin by estimating the long-run 
equilibrium relationship 
ry, = 1.642 + 0.915rs, 


(13.23) (43.15) (6.71) 


Next, we test the residuals from (6.71) for stationarity by estimating an equation in 
the form (6.32). If you experiment with various lag lengths, you will find that various 
lag lengths tests suggest a three lag model or a one lag model. If we adopt the SBC and 
use one lagged change, we obtain 


Aê, = —0.1552,_; + 0.201A2,_, 
(-4.45) (2.96) 


In a model with 2 variables with 208 usable observations, the 5% critical value 
shown in Table C is —3.368 and the 1% value is —3.95. As such, we can reject the 
null hypothesis of no cointegration. Since we are making no assumption concerning 
weak-exogeneity, it is clearly possible to carry out the analysis using rg, as the left-hand 
side variable. Reversing the variables in (6.71) yields 


rg = —1.103 + 0.982r,, 
(—7.04) (43.15) 


In this form, the Engle—Granger test also supports the finding of cointegration 
since the regression of the residuals yields 


Aê, = —0.172@,_,; + 0.219Aé,_, 
(—4.78) (3.24) 


Notice that the two estimates of the long-run equilibrium relationship are some- 
what different from each other. However, it is not possible to conduct inference on either 
of these cointegrating vectors unless you use the methods discussed in Appendix 6.2 in 
the Supplementary Manual. As an exercise, you can repeat the cointegration test using 
a three lag specification. 
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The Johansen Methodology 


Let x, denote the vector [r;; rsl". If you estimate the unrestricted VAR in the form 
of (6.53) G.e., if you estimate the VAR x, = Ay + ))A,x,_;) you should find that the 
SBC selects a lag length of one whereas the AIC and general-to-specific methodology 
selects a lag length of eight. Again, for expositional purposes it is simplest to report the 
results of the one lag model. Given this lag length, it is possible to estimate the model 
in the form of (6.54). Since the interest rates do not continually increase or decrease 
over time, it seems reasonable to constrain the drift terms so that a constant appears in 
the cointegrating relationship. The estimated value of the z* matrix is such that 


va —1.048 1.102 0.956 ]} ‘4 
Z ži 5 1-0.446 0.100 2.133 ||% 
The characteristic roots are such that 2, = 0.1295 and A, = 0.0136 so that 
—T ln(1 — A,) = 29.13 and -T In(1 — A,) = 2.87. To test the null hypothesis of no 
cointegration against the general alternative of 1 or 2 cointegrating vectors compare 
the sum 29.13 + 2.87 = 32.00 to the 5% critical value of the A,,.¢ Statistic shown in 
Table E. Since 32.00 exceeds the critical value of 19.96, reject the null and conclude 
that there is at least one cointegrating vector. To test the null of one cointegrating 
vector against the alternative of two cointegrating vectors, compare the sample value 
of 2.87 to the 5% critical value of 9.24. As such, we can conclude that there is only 
one cointegrating vector. 
Normalizing the cointegrating vector with respect to rz, yields 


iy = 0.912 + 1.051 rs, 
(2.65) (17.88) 


A key difference between this estimate of the long-run equilibrium relationship 
and those from the Engle—Granger test is that standard inference can be performed 
on the coefficients of the cointegrating vector. For example, the likelihood ratio test 
for the null hypothesis that the coefficients on the long-term and short-term rates both 
equal unity is only 0.643 with a prob-value of 0.422. As such, we can conclude that 
the restriction is not binding. Hence, in the long-run, the 10-year rate tends to move 
1:1 with the short-term rate. If you re-estimate the model imposing the restriction, you 
should find 


Ar = —0.098 (r-i — 1.17 — rs1) + 0.185Ar,,_; + 0.002Ar,,_, 


(—2.32) (—7.10) (1.88) (0.03) (6.72) 


Ars, = 0.084 (ry; — 1.17 — rg1) + 0.053Ary,_, + 0.229Ars,_, 
(1.51) (-7.10) (0.41) (2.23) 


If you test for the presence of the intercept, you will find that the constant term in 
the cointegrating vector is highly significant. The important point is that the f-statistics 
on the error-correcting terms imply that the long-term rate adjusts to the discrepancy 
from the long-run equilibrium relationship, but the short-rate does not. In other words, 
rs; is weakly exogenous. Consider the dynamic adjustment mechanism if there is a posi- 
tive l-unit discrepancy from the long-run equilibrium relationship. The estimates imply 


(6.73) 
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that the long-term rate falls by —0.098 units and that the short-term rate does none of the 
adjusting. As such, the deviations from the long-run relationship are quite long lived. 


The Error-Correction Methodology 


In contrast to the Engle—Granger and Johansen methodologies, to use the 
error-correction test it is necessary to assume that one of the variables is weakly 
exogenous. Suppose that we were certain that the short-term interest rate did none of 
the adjustment necessary to restore the long-run equilibrium relationship. Given that 
the short-term rate is weakly exogenous, we can estimate an equation in the form 


Ary, = Bo + Biri) + Bots: + B3Ars, + AWD Ary) + Ag(DArs,_) +, (6.74) 


Equation (6.74) looks very much like (6.72) except that elements of the cointegrating 
vector are unrestricted and the contemporaneous value of Arg is included. Since we are 
not treating all variables symmetrically, there is no need to constrain the lag length rep- 
resented by the polynomial A, (L) to be the same as that from A,(L). However, for this 
case, it turns out that a lag length of six seems appropriate for each variable. Consider 
the estimated equation 


Arp =0.113 —0.171rr, 1 +0.187r5,_, +0.612Ars, +A, (L)Arr 1 +A2(L)Ars,_) + V; 
(1.52) (—4.45) (4.80) (15.92) 
(6.75) 
The key point to note is that the t-statistic for the null hypothesis f; = 0 is —4.45. 
Given the presence of an intercept (d = 1), two /(1) variables (k = 2), and that the 
estimation begins in 196104 (T = 205), the adjusted sample size is T“ = 205 — (2*2 — 
1) — 1 = 201. From Table F, the critical values at the 1, 5, and 10% significance levels 
are approximately —3.834, —3.231, and —2.916, respectively. Hence, we can reject the 
null hypothesis of no cointegration and conclude that the variables are cointegrated. 
We can reparameterize (6.75) such that 


Ary, = -0.171 (r4 — 1.0975,_, — 0.661) + 0.612Ars + A (DAT u1 
+ A,(L)Ars,_| + Vie 


In this particular example, all three approaches find that the variables are cointe- 
grated. The Engle—Granger approach indicates that the speed of adjustment parameter 
is —0.155 (or —0.172), but does not indicate which variable (or variables) does the 
adjustment. In response to a one-unit deviation from the long-run equilibrium, 
the Johansen approach indicates that the long-term rate adjusts by —0.098 units while 
the ECM approach indicates that it adjusts by —0.171 units. The Engle—Granger 
approach does not allow us to readily perform inference of the cointegrating vector, but 
the Johansen approach allows us to conclude that two rates move 1:1 in the long run. 

So long as we are willing to assume fp, # 0, it is possible to perform inference on 
the coefficient on rs,_, in the long-run equilibrium relationship. Clearly, it would have 
been possible to reparameterize (6.75) such that 


Ar,, = —0.187(0.914ry,_1 — rs;_1 — 0.604) + 0.612Ars, + A,(L)Ary,_, 
+A,(L)Ars,_1 +; (6.76) 
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Hence, p) (= 0.187) is the coefficient on a stationary variable so that it has a 
standard f-distribution. Given that the standard error of p, is 0.038, a +1.96 standard 
deviation band runs from 0.111 to 0.263. Alternatively, we could have performed an 
F-test for the null hypothesis 6, = p, in (6.72). A traditional F-test is appropriate since 
each coefficient has a f-distribution. With 1 degree of freedom in the numerator and 
189 in the denominator, the sample value of F = 2.86 is significant at the 0.093 level. 
If you re-estimate the model such that f; = p}, you should find 


Ary, = -0.1751 — Fg; — 2-01) + 0.604Ars, +A (L)Ary,_, + Ar(L)Ars,_) + ¥; 


If you are willing to abstract from the stationary dynamics, it is clear how to trace 
out the effects of a one-unit shock in Arg. All else equal, if Ars, = 1, it follows that 
Ar,, = 0.604. In period ¢ + 1, it follows that the discrepancy from the long-run equilib- 
rium is —0.396 (= 0.604 — 1) and the change in the long-rate is (—0.396)(—0.175) = 
0.069. In subsequent periods, the long rate keeps rising by 17.5% of the discrepancy 
from the long-run equilibrium. At this point, you could go on to perform the innovation 
accounting by estimating an equation of the form Ars, = A3(L)Ar,, + Aq(L) Ars, + ezr- 
Note that the equation is in first-differences since the Ars, equation does not contain 
an error-correction term. Also note that the assumption that Ars, is weakly exogenous 
implies a causal ordering of the innovations in that a v, shock has no contemporaneous 
effect on Ars, but an e, shock directly affects Ar;,. 


12. SUMMARY AND CONCLUSIONS 


Many economic theories imply that a linear combination of certain nonstationary vari- 
ables must be stationary. For example, if the variables {x,,}, {x>,}, and {x3,} are [(1) 
and the linear combination e, = Po + Bix), + 2X2; + P3X3, is Stationary, the variables 
are said to be cointegrated of order (1, 1). The vector (Po, $1, P2, P3) is called the coin- 
tegrating vector. Cointegrated variables share the same stochastic trends and so cannot 
drift too far apart. Cointegrated variables have an error-correction representation such 
that each responds to the deviation from “long-run equilibrium.” 

One way to check for cointegration is to examine the residuals from the long-run 
equilibrium relationship. If these residuals have a unit root, the variables cannot be coin- 
tegrated of order (1, 1). Another way to check for cointegration among /(1) variables 
is to estimate a VAR in first differences and include the lagged level of the variables. 
The Johansen methodology uses the Atrace and Amax test statistics to determine if the 
variables are cointegrated and the number of cointegrating vectors. These tests are 
sensitive to the presence of the deterministic regressors included in the cointegrating 
vector(s). Restrictions on the cointegrating vector(s) and/or the speed of adjustment 
parameters can be tested using y? statistics. You should be aware of the role of the 
deterministic regressors in a cointegration framework. Johansen (1994) shows how to 
test to determine whether there is a deterministic trend, drift terms outside of the coin- 
tegrating vector, or constants that all appear in the cointegrating vector. A third way 
to test for cointegration is to estimate the error-correction model. If only one vari- 
able adjusts to the discrepancy from the long-run equilibrium relationship, it can be 
preferable to estimate an autoregressive distributed lag model. It is straightforward to 
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estimate the model using OLS and to perform hypothesis tests on the coefficients of 
the cointegrating vector. For more complicated situations, Appendix 6.2 discusses the 
Phillips-Hansen (1990) method of modeling in a single equation framework. 


QUESTIONS AND EXERCISES 


1. Let equations (6.14) and (6.15) contain intercept terms such that 


Y, = Qio FAY, FAZ HE and Z, = ay + aY, + Ay + Ey 
a. Show that the solution for y, can be written as 
y»y=[(-— Ay L)E,, + (1 = dy )dyy + aj LE, + Ay Ay) 1/[ — a), L)(1 — ay L) - apa L’] 


b. Find the solution for z,. 

c. Suppose that y, and z, are CI(1, 1). Use the conditions in (6.19), (6.20), and (6.21) to 
write the error-correcting model. Compare your answer to (6.22) and (6.23). Show that 
the error-correction model contains an intercept term. 

d. Show that {y,} and {z,} have the same deterministic time trend (i.e., show that the slope 
coefficients of the time trends are identical). 

e. What is the condition such that the slope of the trend is zero? Show that this condition is 
such that the constant can be included in the cointegrating vector. 

f. Modify (6.26) so that each equation has an explicit intercept. Specifically, let x, = Ay + 
A,x,_, + €, where A, is an (n - 1) vector with elements a,;. Suppose that the rank of z 
is 1. How are the solutions to 6.28 affected the presence of the intercepts? How is the 
error-correction representation in (6.29) affected? 

2. The data file COINT6.XLS contains the three simulated series used in Sections 5 and 9. 

a. Use the data to reproduce the results in Section 5. 

b. Obtain the impulse responses and variance decompositions using the ordering such that 
y, > z, > w, Do these seem reasonable, given the way in which the variables were 
constructed? 

c. Use the data to reproduce the results in Section 9. 

d. Examine Table 6.1. Show that y, and z,, but not w,, are weakly exogenous. 

e. Use the data to compare the ECM test to the Engle—Granger and Johansen tests treating 
y, and z, as weakly exogenous. 

3. In Question 9 of Chapter 4 you were asked to use the data on QUARTERLY.XLS to estimate 
the regression equation 


INDPROD, = 30.48 + 0.04M1NSA, 
(29.90) (36.58) 


a. Use the Engle—Granger test to show that the regression is spurious. 

b. Examine the scatter plot of IVDPRO, against M1NSA,. How do you explain the fact that 
R? is close to unity and that the t-statistic on the money supply is 36.58? 

c. Use the data on the file labeled REAL.XLS. Denote the natural logs of real GDP and 
consumption by /y, and /c,, respectively. Estimate the regression 


Ic, = 0.962 + 1.06ly, R? = 0.999 
(-51.78) (494.19) 


If you perform the Engle—Granger test using four lags, you should find 


4 
Aê, = -0.0922,_, + È` BAe, 


i=1 


i 


402 CHAPTER6 COINTEGRATION AND ERROR-CORRECTION MODELS 


The t-statistic for @,_, is —3.48. How do you interpret the consumption—income relation- 
ship? 

4. The file labeled QUARTERLY.XLS contains the interest rates paid on U.S. 3-month, 5-year, 
and 10-year U.S. government securities. The data run from 1960Q1 to 201204. The vari- 
ables are labeled TBILL, R5 and R10, respectively. 

a. Pretest the variables to show that the rates all act as unit root processes. Specifically, 
perform augmented Dickey—Fuller tests using the lag length selected by deleting lags 
until the t-statistic on the last lag is significant at the 5% level. If you include an intercept 
(but no time trend) you should obtain: 


Series Lags Estimated a, t-statistic 


TBILL 7 —0.028 -1.61 
R5 5 —0.013 =1.03 
R10 7 —0.011 —0.78 


b. Estimate the cointegrating relationships using the Engle—Granger procedure. Perform 
augmented Dickey—Fuller tests on the residuals. Using TBILL as the “dependent” vari- 
able, you should find 


TBILL, = 0.367 +2.7R5, — 1.91R10, 
(2.31) (—13.44) (20.78) 


where f-statistics are within parentheses. 

Perform the Engle—Granger test on the residuals from the equation above. Why is it 
appropriate to use eight lags in the augmented form of the test? If you use eight lags, 
you should find that the coefficient on the lagged residual (i.e., e,_,) is —0.276 with a 
t-statistic of —4.08. The 5% critical value is about —3.76. Based on this data, do you 
conclude that the variables are cointegrated? 

c. Repeat part b using R10 as the dependent variable. If you use 6 lags in the augmented 
form of the Engle—Granger test (i.e., estimate Ae, = a,e,_, + ...) you should find a, = 
—0.105 and the t-statistic is —2.34. Using R10 as the dependent variable, are the three 
interest rates cointegrated? 

d. Estimate the model using the Johansen procedure. Use 7 lags and include a constant in 
the cointegrating vector. You should find the following: 


Trace Tests Maximum Eigenvalue Tests 
Null Alternative = A,... 5% Value Null Alternative Ais 5% Value 
r=0 r>1 45.50 34.91 r=0 r=1 37.83 22.00 
r<l r22 7.67 19.96 r=1 r=2 6.89 15.67 
r<2 r=3 0.78 9.24 r=2 r=3 0.78 9.24 


i. Explain why the 4,,.,. test strongly suggests there is exactly one cointegrating vector. 
ii. To what extent is this result reinforced by the Anax test? 


Verify that the cointegrating vector is 
1.99TBILL, + 0.879R5, — 1.67R10, + 0.820 = 0 


Compare this result to your answer in part b. 
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e. Check to determine whether the individual interest rate pairs are cointegrated. In particu- 
lar, is R5, with cointegrated R10,? 

f. Why might you be wary about testing for cointegration using the ADL test developed in 
Section 10? 

. In Question 4, the Engle—Granger methodology found that the long-run equilibrium rela- 

tionship for the three interest rates was 


TBILL, = 0.367 — 1.91R5, + 2.74R10, 


a. Estimate an error-correcting model using 2 lagged changes of each variable. Use the 
residuals from this long-run equilibrium relationship as the error-correction term and do 
not include intercepts. You should find that the error-corrections are such that 


ATBILL, = 0.062e,_, +- - - t-statistic for the error-correction term: 0.73 
ARS, = —0.16le,_, +--+ t-statistic for the error-correction term: —2.94 


AR10, = —0.162e,_; +--+ t-statistic for the error-correction term: —2.52 


where e,_, is the lagged residual from your estimate in part the equilibrium relationship. 

i. Verify that the multivariate AIC selects a model with 2 lagged changes of each vari- 
able. Perform the appropriate diagnostic tests on the system. In particular, determine 
whether the three series of residuals appear to be white noise. Are the lags lengths 
unnecessarily short? 

ii. Discuss the nature of the adjustment. Are any of the rates weakly exogenous? In 
response to a deviation from the long-run relationship, how are the three rates pre- 
dicted to change? 

b. Use a Choleski decomposition such that the T-bill rate is causally prior to R5, and RS, is 
causally prior to R10,. 

c. Obtain the variance decompositions using the same ordering as you used in part b. Show 
that the preponderance of the forecast error variance of each rate is primarily due to the 

T-bill rate. 


. Suppose you estimate z to be 


0.6 -05 0.2 
z=|03 -0.25 0.1 
12 -10 04 


a. Show that the determinant of z is zero. 

b. Show that two of the characteristic roots are zero and that the third is 0.75. 

c. Let p’ = (3 — 2.51) be the single cointegrating vector normalized with respect to x;,. 
Find the (3 - 1) vector @ such that z = af’. How would a change if you normalized p 
with respect to x,,? 

d. Describe how you could test the restriction J, + p, = 0. 

Now suppose you estimate z to be 


08 04 0.0 
r=] 01 01 0.0 
0.75 0.25 0.5 


e. Show that the three characteristic roots are 0.0, 0.5, and 0.9. 
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f. Select p such that 


0.8 0.75 
p=104 0.25 
0.0 0.5 


Find the (3 - 2) matrix a such that z = af’. 


7. Suppose that x,, and x,, are integrated of orders 1 and 2, respectively. You are to sketch the 
proof that any linear combination of x,, and x, is integrated of order 2. Toward this end: 
a. Allow x,, and x, to be the random walk processes: x,, = X),_, + E; and x5, = Xy,_, + Ez 
i. Given the initial conditions x,) and x,), show that the solutions for x,, and x,, have 
the form x,, = Xio + Le,,_; and x,, = X59 + LE,,_;. 

ii. Show that the linear combination f,x,, + £,x>, will generally contain a stochastic 
trend. 

iii. What assumption is necessary to ensure that x,, and x,, are CI(1, 1)? 

b. Now let x,, be integrated of order 2. Specifically, let Ax,, = Ax,,_, + £. Given initial 
condition for x» and x,,, find the solution for x,,. (You may allow £, and e, to be per- 
fectly correlated.) 

i. Is there any linear combination of x,, and x,, that contains only a stochastic trend? 
ii. Is there any linear combination of x,, and x,, that does not contain a stochastic trend? 


c. Provide an intuitive explanation for the statement: If x,, and x,, are integrated of orders 
d, and d, where d, > d,, any linear combination of x,, and x, is integrated of order d,. 

8. Chapter 6 of the Programming Manual uses the variables Tbill and Tb/yr on the file QUAR- 

TERLY.XLS to illustrate both the Johansen and Engle—Granger cointegration tests. 

a. Verify that the t-statistics of the Dickey—Fuller tests using 7 lags are — 1.61304 and 
—1.39320 for the Tbill (r) and Tblyr (r,,), respectively. 

b. Estimate the long-run relationship alternatively using Tbill and Tb/yr as the “‘indepen- 
dent” variable. For r,, as the left-hand-side variable, you should find r,, = —0.187 + 
0.936r,,- 

c. Estimate an equation in the form of (6.32) using 6 lags. The estimate of a, should be 
—0.372 with a t-statistic of —4.78. Use Table C to determine whether the variables are 
cointegrated. What happens if you use r,, as the left-hand-side variable in the long-run 
relationship? 

d. Estimate the error-correction model and obtain the impulse response functions. Your 
results should look like those in Section 6.1 of the Programming Manual. 

e. If you perform the Johansen test using seven lags you should find that the eigenvalues are 
0.1523 and 0.0078. Calculate the A nay and the A,,.... Statistics as in (6.55) and (6.56). Use 
your results to the number of cointegrating vectors. 


9. The file COINT_PPP.XLS contains monthly values of the Japanese, Canadian, and Swiss 

consumer price levels and the bilateral exchange rates with the United States. The file 

also contains the U.S. consumer price level. The names on the individual series should 

be self-evident. For example, JAPANCPI is the Japanese price level and JAPANEX is the 

bilateral Japanese/U.S. exchange rate. The starting date for all variables is January 1974 

while the availability of the variables is such that most end near the end of 2013. The price 

indices have been normalized to equal 100 in January 1973 and only the U.S. price index is 

seasonally adjusted. 

a. Form the log of each variable and pretest each for a unit root. Can the null hypothesis of 
a unit root be rejected for any of the series? How might you proceed if you found that the 
U.S. CPI was trend stationary? 
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b. Form the log of each variable. Estimate the long-run relationship between Japan and the 
U.S. as 


log(japanex) = 9.97 — 0.104 log(japancpi) — 0.768 log(uscpi) 
(27.25) (—0.98) (—17.05) 


i. Do the point estimates of the slope coefficients seem to be consistent with long-run 
PPP? 
ii. From the f-statistics, can you conclude that the Japanese CPI is not significant at the 
5% level? 
c. Let u, denote the residuals from the long-run relationship. Use these residuals to perform 


the Engle—Granger test for cointegration. If you use eleven lagged changes, you should 
find 


11 
Au, = -0.025u,_, + È a,Au,_, +E, 


i=1 
The t-statistic on the coefficient for u,_, is —3.44. From Table C, with three variables and 
457 usable observations, the 5% and 10% critical values are about —3.760 and —3.464, 
respectively. Do you conclude that long-run PPP fails? 
d. Repeat parts i and ii using Canada and Switzerland. If you use the residuals from the 
long-run equilibrium relationships you should find 
Canada (10 lags) Au, = —0.012u,_, + YaAu,; +E; t-stat. = —1.89 
Switzerland (10 lags) Au, = —0.027u,_, + Fadu; +E; t-stat. = —3.02. 
e. Although (at conventional significance levels) we reject the null hypothesis of long-run 


PPP between Japan and the United States, estimate the error-correction model for 
ljapanex,. If you use 11 lagged changes of each variable, you should find 


Aljapanex, = —0.005 — 0.0302, _, + LAB, liapanex,_, + 2LAP,ljapancpi,_,;+ ZAP;luscpi,_; 
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FIGURE 6.5 Responses of the Japanese Exchange Rate 
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where é,_, is the residual from the equilibrium relationship above and eleven lagged 
changes are used for each variable. The t-statistic on the error correction term is —3.54. 
Which of the variables(s) can be said to be weakly exogenous? 

f. Obtain the impulse functions using the ordering luscpi, > ljapancpi, > ljapanex,. As in 
Figure 6.5, you should find that the U.S. price shock has little effect on the exchange rate 
but that a shock to the Japanese price level causes the yen to depreciate. The response of 
the exchange rate to its own shock is immediate and permanent. 

g. Are the results of the cointegration test sensitive to the normalization (i.e., which of the 
variables is used as the ‘dependent’ variable) used in the equilibrium regression? 

10. In Question 9d, you were asked to use the Engle—Granger procedure test for PPP among the 
variables log(canex), log(cancpi), and log(uscpi). 

a. Now use the Johansen methodology and constrain the constant to the cointegrating vec- 
tor to obtain: 


Rank A; A max A trace 

i 0.0535 25.647 35.987 
2 0.0138 6.460 10.339 
3 0.0083 3.879 3.879 


Use Table E to show that there is a single cointegrating vector. 
b. Consider the estimated cointegrating vector: 


—0.949 log(canex) — 6.484 log(cancpi) + 1.600 log(uscpi) + 31.653 = 0 


Normalize with respect to the exchange rate. Does the long-run relationship seem to be 
consistent with PPP? 


CHAPTER y 


NONLINEAR MODELS 
AND BREAKS 


Learning Objectives 


1. 


Introduce nonlinear models and compare them to linear ARMA models. 
Show that nonlinear models can characterize the behavior of many eco- 
nomic variables. 

Introduce some simple nonlinear models including the generalized autore- 
gressive and bilinear models. 

Develop a number of tests that can detect the presence of nonlinear adjust- 
ment. Explain the difficulties of testing when there are unidentified nuisance 
parameters. 

Explain the basic threshold autoregressive model. 

Consider several extensions of the threshold autoregressive model and intro- 
duce the threshold regression model. 

Illustrate threshold models using examples of the unemployment rate, 
Taylor rule, and capital stock in the pork sector. 

Explain the basic smooth transition autoregressive (STAR) model. Show 
how to pretest for STAR models. 

Discuss artificial neural network and Markov switching models. 

Estimate an LSTAR model using simulated data and discuss an ESTAR 
estimate of the real exchange rate. 

Explain how to obtain impulse responses in a nonlinear model. Illustrate 
these impulses from a model of U.S. GDP and a model of transnational 
terrorism. 

Consider the issue of testing for a unit root in a nonlinear model. 

Show that models with endogenous structural breaks have a number of 
important similarities to models exhibiting nonlinearity. Consider models 
with nonlinear breaks. 


Economic theory suggests that a number of important time-series variables should 
exhibit nonlinear behavior. For example, the observation that wages display down- 
ward rigidity is a key feature of many macroeconomic models. Moreover, it has been 
established that downturns in the business cycle are sharper than recoveries in that 
key macroeconomic variables, such as output and employment, fall more sharply than 
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they rise. Since the standard ARMA model relies on linear difference equations, new 
dynamic specifications are necessary to capture nonlinear behavior. In fact, research in 
this new area of time-series econometrics seems to be growing exponentially (itself, a 
nonlinear process). 


1. LINEAR VERSUS NONLINEAR ADJUSTMENT 


On a long automobile trip to a new location, you might take along a road atlas. Since 
the earth is not flat, the maps contained in the atlas are a linear approximation to the 
actual path of your journey. Nevertheless, for most trips, such a linear approximation is 
extremely useful. Try to envision the nuisance of a nonlinear road atlas. For other types 
of trips, the linearity assumption is clearly inappropriate. It would be disastrous for 
NASA to use a flat map of the earth to plan the trajectory of a rocket launch. Similarly, 
the assumption that economic processes are linear can provide useful approximations to 
the actual time paths of economic variables. Nevertheless, policy makers could make a 
serious error if they ignore the empirical evidence that the unemployment rate increases 
more sharply than it decreases. 

One example of a nonlinear model that has been used in the literature is the thresh- 
old autoregressive (TAR) model. To explain how it might be useful, let rz, and rg, be 
the long-term and short-term interest rates on two similar financial instruments. Sup- 
pose that the spread, defined as s, = rz, — fsp adjusts to the long-run value s. A simple 
AR(1) representation of the dynamic adjustment mechanism might be: 


S, = ao + 4a1S1 FE; where 0 < a, <1. 


For our purposes, it convenient to define s as the long-run value ag/(1 — a,) and 
write the adjustment process as 


S,=S+a,(s,_)—s) +e, 


If s, = s, the system is said to be in long-run equilibrium. In other circumstances, a, 
percent of the current period’s deviation from the long-run value tends to persist into the 
next period. Instead of displaying linear adjustment, suppose that interest rate spreads 
display a nonlinear adjustment pattern. Periods in which the spread is low relative to 
its long-run value (so that s,_,; — $ < 0) are far more persistent than periods in which 
S,_1 — S > 0. It is possible to model these differing degrees of persistence using: 


S+a, (5, —5) +e, whens, >s 
Ss 
t 


a - 5 (7.1) 
S+ a(S 1 —S)+€, whens, <S 


where €,, and €, are white-noise processes. 

In (7.1), when s,_; is above the threshold value s, the spread follows the AR(1) 
process s, = $ + a,(s,_; — S) + £; and when s,_, is below the threshold, the spread fol- 
lows the AR(1) process s, = $ + a(s,_1 — S) + €;. As long as |ay| > |a,|, periods when 
S,;-1 < S will tend to be more persistent than other periods. 
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To better illustrate the difference between linear and nonlinear adjustment, con- 
sider the homogeneous part of the first-order AR(1) model: 


Ye = AYy-1 


Given that —1 < a, < 1, we know that the long-run mean is such that Ey, = 0. The 
nature of the adjustment process is such that a, percent of any current deviation from 
the long-run equilibrium persists into the next period. For example, if a; = 0.5 and the 
initial condition is such that y,_, = 1.0, it immediately follows that E,_,y, = 0.5 and 
E,-1Yi41 = 0.25. Let { y*_,} denote the specific sequence {1, 0.50, 0.25, 0.125, ...} 
generated by assuming the initial condition y,_; = 1. The linearity of the adjustment 
process can be demonstrated by considering alternative values for y,_,. If the initial 
condition is such that y,_; = 2, the subsequent values of the new sequence are exactly 
twice those of the previous case. In fact, multiplying the initial value of y,_; by any 
scalar A results in the sequence {aye J}. The phase diagram shown in Panel (a) of 
Figure 7.1 represents the linear nature of this adjustment process. The solid straight 
line labeled AOB is constructed to have a slope equal to a,. Hence, for any value of 
y,;-1, you can obtain the next value in the sequence by using line AOB to project y,_; 
onto the y, axis. Since the slope is constant, any scalar multiple of y,_, will result in 
the proportional value of y,. As shown in the figure, if y,_; = 1, the expected value of 
y, = 4, and if y,_, = 2, the expected value of y, = 2a,. 

Also note that adjustment is symmetric around zero. If y,_,; = —1, E,_,y, = —0.5 
and E,_1¥;4; = —0.25 and so on. Hence, for the linear model, multiplying the initial 
condition y,_; by —1 results in the sequence {—y* , }. 

Now suppose that the phase diagram is such that adjustment occurs along the 
kinked line passing through A’OB. Thus y, = a,y,_; when y,_; > 0 and y, = ay,_, 
when y,_, < 0. Again, if y,_, = 1, the next value in the sequence equals a,. However, 
if yı = —1, the next value in the sequence equals —a,. Since a, > a, it should be 
clear that clear that {y,} sequence will approach zero more slowly when beginning 
from a negative value of y,_, than a positive value. Hence, the adjustment process is 
not linear since the choice Ay,_; does not necessarily result in the sequence {4y* ; }. 
This is precisely the type of adjustment represented by equation (7.1); if a, > a, and 


“ Vp= aY ia 


Panel (a) Panel (b) 


FIGURE 7.1 Two Nonlinear Adjustment Paths 
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s = 0, the {s,} sequence has precisely the phase diagram shown by A’OB in Panel (a) 
of Figure 7.1. 

A different type of nonlinear model is needed to represent the process of gravita- 
tional attraction. From elementary physics, we know that the speed of an object in space 
will increase as it falls toward the earth. We can represent the earth as being located 
at point 0 and suppose that the object in space is attracted to point 0. If y, denotes the 
distance of the object from 0 at time f, gravitational attraction can be represented by 
the curve passing through AOB in Panel (b) of Figure 7.1. As shown in the figure, if we 
let y,_, = 1, the value of y, will be a,. Instead, if y,_; is 0.5, the value of y, must be less 
than a, /2. Since Ay,_; does not result in the sequence { Ay” _, }, the adjustment process 
is not linear. The straight line y, = a,y,_; passing through AOB does not capture this 
feature of the adjustment process. 

Take a moment to imagine other types of nonlinear processes. For example, 
transport costs might deter arbitrage of a slight discrepancy between cotton prices in 
Alabama and Mississippi. In contrast, large price discrepancies might be eliminated 
almost immediately. If you own a car, it should not take long to convince you that 
the behavior of gasoline prices is nonlinear. Clearly, gasoline price increases are far 
sharper than price decreases. You should be able to think of several other examples. 
The point is that once we decide to leave the realm of linearity, there are many potential 
types of nonlinearity. It can be especially important to determine the most appropriate 
form of the nonlinearity. After all, adopting an incorrect nonlinear specification may 
be more problematic than simply ignoring the nonlinearity altogether. Since selecting 
the proper nonlinear model can be difficult, it is not surprising that this remains an 
important area of current research. However, some special forms of nonlinearity 
have proven to be particularly useful in applied time-series research. We begin by 
presenting an overview of some simple nonlinear models. 


2. SIMPLE EXTENSIONS OF THE ARMA MODEL 


The simplest form of the nonlinear autoregressive (NLAR) model is 


Y =f) +E; 


This is a first-order nonlinear autoregressive model, denoted by NLAR(1), in that 
the longest lag length is one. It is possible to reparameterize the model in a more inter- 
esting way: 

Yi = 41) Y1 tE; (7.2) 


where ay (¥1) * ¥-1 = fOr-1): 

Equation (7.2) looks exactly like an AR(1) model except for the fact that the autore- 
gressive coefficient a, is allowed to be a function of the value of y,_,. If we do not know 
the functional form of f( ), the usual dichotomy between nonlinearity in variables and 
time-varying parameters is not really clear-cut. It can be very difficult for a statistical 
test to detect the difference between a model in which some of the regressors are not 
raised to the power one and a model in which the parameters are varying over time. 


SIMPLE EXTENSIONS OF THE ARMA MODEL 411 


More generally, the p-th order nonlinear autoregressive model is: 
Yi =F OAs Yt-2 +s Yeap) F Er (7.3) 


and is denoted by NLAR(p). 

The difficulty in estimating (7.3) is that the functional form of f( ) is unknown. 
One way to proceed is to use a Taylor series approximation of the unknown functional 
form. For the NLAR(2) model y, = f();_1, ¥;-2) + £; the Taylor series approximation 
using terms no higher than order-three is: 


= 2 2 
Yi = Aq + A Yp-y F AYi-2 F Ay 2-1-2 F V1 + a2; 
2 2 3 3 
Ty 2Y,_ 1-2 F 4122-1 Yj_-2 F 11 Yj) T 9222Y;_2 t Er 


For the more general NLAR(p) we need a more compact notation. A simple way 
of writing such a Taylor series approximation is: 


Pp 


yzat’ ant LLY Ya aij D+ E (7.4) 


il j=l k=1 Al 


where p is the order of the process and r and s are integers that are greater than or equal 
to 1. In order to avoid a very large number of parameters, the sum of r and s is usually 
restricted to be less than or equal to 4. 


The GAR Model 


Equation (7.4) is called the Generalized Autoregressive (GAR) model. GAR models 
extend the standard AR model by including various powers of the lagged values of y,_; 

and cross-products of the powers of y,_; and y,_;. As a Taylor series approximation, the 
GAR model is capable of mimicking a wide variety of functional forms—all that is 
required is that the function be differentiable. Moreover, the model is easy to estimate; 
simply form the variables y, and their cross-products and estimate the model using 
OLS. A test for nonlinearity can be carried out directly since the linear model is nested 
with the GAR model. If it is not possible to reject the null hypothesis that all values of 
ijkl = 0, it can be concluded that the process is linear. On the downside, the resulting 
model is likely to be overparameterized. This is especially true if the number of lags in 
the model is more than two. You can use traditional t-tests and F-tests to pare down the 
number of parameters estimated. However, this can be tricky since the regressors are 
likely to be highly correlated. For example, the variable ye , will clearly be correlated 
with ye ,- As such, the usual practice is to pare down the equation using the AIC or SBC. 


The Bilinear Model 


Just as a parsimonious ARMA model can well-approximate a high-order AR(p) pro- 
cess, it is possible to use moving average terms in a nonlinear model. Consider the 
simple bilinear (BL) model: 


Vp = A + a Yp_1 + ByEpy + CEY HE 
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The intent is to use moving average terms and the interactions of autoregressive 
and MA terms to approximate a high-order GAR model. As such, bilinear models are a 
natural extension of ARMA models in that they add the cross-products of y,_; and €,_; 
to account for nonlinearity. The general form of the bilinear model BL (p, q, r, s) is 


P q r $ 
Y, = A + > AY i +E, + > BE i + > py CijYt—iEt-j (7.5) 
i=l i=l i=1 j=1 
Notice that the linear ARMA(p, q) model is nested within (7.5); if all values of 
cy = 0, (7.5) is identical to an ARMA(p, q) model. As with the GAR model, the bilinear 
model can be viewed as having stochastic parameter variation. To understand the point, 
consider the BL model: 


Yi = Ay + QY F CIYEE t Er 


so that: 
Yi = Qo + (a, + CE), HE; (7.6) 


Equation (7.6) looks like an autoregressive model except for the fact at the autoregres- 
sive coefficient is a, + c,€,_,. In a sense, the autoregressive coefficient is a random 
variable with a mean equal to aj. If cı is positive, the autoregressive coefficient will 
increase with £,_4. In this way positive €,_, shocks will be more persistent than negative 
shocks. 

Now for a little quiz. You cannot use OLS to estimate (7.5) or (7.6) since you cannot 
directly form the variables y,_j;€,_;. The question is: If you have the single time-series 
{y,}, how can you estimate the series as a bilinear process? The standard procedure 
is to use maximum likelihood estimation. Many of the standard econometric software 
packages allow you to perform the estimation using a straightforward generalization 
of the method developed in the Supplementary Manual (see Appendix | of Chapter 2) 
for the estimation of an MA process. 


An Example 


Rothman (1998) compared the in-sample fit and out-of-sample forecasting perfor- 
mance of a number of nonlinear models of the U.S. unemployment rate. Toward this 
end, he detrended the log of the unemployment rate and estimated the following three 
models over the 1948Q1 to 197904 period: 


AR u, = 1.563u,_; — 0.670u,_) + £, 
(22.46)  (—10.06) 


GAR u, = 1.500u,_,; — 0.553u,_, — 0.745u?_, +E, variance ratio = 0.965 
(23.60) (—6.72) (—2.33) 


BL u = 1.910u,_,; — 0.690u,. — 0.585u,_\€;3 +€ Variance ratio = 0.936 
(24.11) (-10.55) — (—2.08) 


where u, = the detrended log of the unemployment rate 
variance ratio = the ratio of the residual variance of the estimated model 


to the residual variance of the AR model 


TESTING FOR NONLINEARITY 413 


The AIC was used to select the most appropriate values of p and q from the general 
class of ARMA(p, q) models. The AR(2) specification yielded the best fit from the class 
of linear ARMA models. A general specification search within the class of GAR models 
was undertaken and the AIC was used to select the one with the best fit. Simply, for the 
given lag length of two, all models in the form of (7.4) were estimated and the one with 
the lowest AIC was retained. Notice that only the cubic term on the second lag of the 
unemployment rate was deemed to be important. Since the GAR model incorporates 
the AR(2) models as a special case, it is not surprising that it has a smaller residual 
variance. As Rothman indicates, it is instructive to write the estimated GAR model as 


u; = 1.500u,_, — [0.553 + 0.745u?_,]u,_2 + £; 


In this form, the GAR model can be viewed as an AR(2) process such that the 
coefficient on the second lag is —[0.553 + 0.745u2_,]. As such, large deviations from 
trend unemployment (so that ur, is large) are associated with lower autoregressive 
persistence then small deviations from trend. As such, the speed of adjustment is faster 
when unemployment is far from its trend value than when it is close to the trend. Hence, 
the speed of adjustment is opposite to that of gravitational attraction. As an exercise, 
you should sketch the phase diagram for this adjustment process and compare your 
answer to Panel (b) of Figure 7.1. 

Of the three models, the BL model has the smallest residual variance. The general 
BL model in the form of (7.5) was estimated for various values of r and s. Again the 
AIC was used to select the best fitting model from this class. Notice that the estimated 
bilinear model uses the cross-product u,_;€,_3 even though the linear model contains 
only two lags (i.e., r and s were allowed to exceed the order of the linear portion of 
the equation). Rothman indicates that u,, and €,_3 are positively correlated. Since 
the coefficient on u,_,€,_3 is negative, large shocks to the unemployment rate imply a 
faster speed of adjustment than small shocks. As u,_, and €,_3 tend to move together, 
the larger u,_,€,_3, the smaller is the degree of persistence. 


3. TESTING FOR NONLINEARITY 


Before introducing other types of nonlinear models, it is important to be aware of sev- 
eral standard tests for the presence of nonlinearity. Pretesting for nonlinearity can help 
protect you from overfitting the data. Recursive estimation and the CUSUM test devel- 
oped in Chapter 2 can be helpful in detecting nonlinearities. This section will present a 
number of additional procedures that have been developed to determine if the data seem 
to be nonlinear and to help to determine the form of the nonlinearity. Be forewarned 
that no set of tests can actually pin down the proper form of nonlinearity. Rather, the 
tests can only suggest the form of the nonlinearity. 


The ACF and the McLeod-Li Test 


In estimating an ARMA model, the autocorrelation function can help you select the 
proper values of p and q, and the ACF of the residuals is an important diagnostic tool. 
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Unfortunately, the ACF as used in linear models may be misleading for nonlinear mod- 
els. The reason is that the autocorrelation coefficients measure the degree of linear 
association between y, and y,_;. As such, the ACF may fail to detect important nonlinear 
relationships present in the data. Consider the following example: 


yp =e), +8 (7.7) 


where {€,} is a normally distributed white-noise process. 

Since y,_; is a function of €,_;, the value of y, is dependent on the value of y,_,. 
Nevertheless, with a little bit of algebra, it is possible to show that all of the autocorre- 
lations are equal to zero. To derive this result, call var(e,) = var(€,_;) = o°. If you take 
the expectation of (7.7), it follows that Ey, = Ey,_,; = 07. Thus, the autocorrelations 
are 


pi = Ely, — 0° )(,-4 - 0°) 


S) 7.2 2 
= Elen + E= Oo en _ + Emio) 
N 2 SA a 2 2 
= EE epg FE Eri T EOF EE 

2.2 
OE izi 


+ €,€;_; — €,0 
—o7e,_;+0°0") 
Note that Bene) =0’o’, Ele) = 0 and E(e,07) = 0. As such: 


pi = 0°00? + Ee? Eni - 0°0’ — 0°0’ +00” = EE? Eni 

Clearly, all values of Ee? 1Er-i = 0 ifi +Æ 1 so that all values of p;(i # 0) are equal 
to zero. Moreover, if €, is normally distributed, the third moment Es} = Ee? i= 0. 
Now suppose that you observe the sample ACF for {y,} but are unaware that the data 
were generated by (7.7). Based on the observation that the sample autocorrelations are 
small, you might mistakenly conclude that the series is white noise. You would not 
be the first person to fall into the trap of confusing a lack of correlation with statis- 
tical independence. Although the autocorrelations are zero, the value of y, is clearly 
dependent on the value of y,_,. 

Since we are interested in nonlinear relationships in the data, a useful diagnostic 
tool is to examine the ACF of the squares or cubed values of a series. For example, 
the ACF of y? (or the squares of the residuals from an estimated equation) can reveal 
a nonlinear pattern. To illustrate the point, Granger, Tjostheim, and Teräsvirta (2011) 
show that the ACF from chaos may be indicative of white noise but that the ACF of 
squared values of the sequence may be large. A nonexplosive sequence is chaotic if it 
is generated from a deterministic difference equation such that it does not converge to 
a constant or to a repetitive cycle. Consider the following chaotic process: 


Y= Wl =y) for0<y <l (7.8) 


In (7.8), y, is related to the level and the squared value of y,_,. However, the auto- 
correlations of {y,} will all be small but the ACF of { y7} will be large. To follow along 
using your software package, set y} = 0.7 and generate the next 99 values of {y,} using 
(7.8). Even though the sequence is perfectly predictable, you should find that the first 
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six autocorrelations are 


P| P2 P3 P4 P5 P6 
—0.074  -0.072 0.008 0.032  —0.016 —0.030 


All of the correlations are less than one standard deviation from zero. However, the 
correlation coefficient between y? and X is —0.28 1 and the autocorrelation coefficient 
between y and y is —0.386. With 100 observations, these two correlations are highly 
significant. The point of the example is to show that any neglected nonlinearity in your 
data can be checked using the ACF of the squared (or cubed) values of the series. To be 
a bit more formal, the McLeod—Li (1983) test seeks to determine if there are significant 
autocorrelations in the squared residuals from a linear equation. To perform the test, 
estimate your series using the best-fitting linear model and call the residuals é,. As ina 
formal test for ARCH errors (see Section 2 of Chapter 3), form the autocorrelations of 
the squared residuals. Let p; denote the sample correlation coefficient between squared 
residuals è? and è, and use the Ljung—Box statistic to determine whether the squared 
residuals exhibit serial correlation. Hence, form: 


n 


Q=T(T+2) oe /(T-i 


i=1 


The value Q has an asymptotic y? distribution with n degrees of freedom if the 
{2} sequence is uncorrelated. Rejecting the null hypothesis is equivalent to accepting 
that the model is nonlinear. Alternatively, you can estimate the regression: 


OP = agta? HHan HN 

If there are no nonlinearities, a, through a, should be zero. With a sample of T 
residuals, if there are no nonlinearities, the test statistic TR? will converge to a 0 
distribution with n degrees of freedom. In small samples you can use an F-test for 
the null hypothesis a; = a =---=a,, = 0. If you are astute, you will remember that 
this test was used to detect ARCH-type errors. It turns out that the McLeod—Li (1983) 
test is the exact Lagrange multiplier (LM) test for ARCH errors. However, the test has 
substantial power to detect various forms of nonlinearity. Notice that the actual form 
of the nonlinearity is not specified by the test. Rejecting the null hypothesis of linearity 


does not tell you the nature of the nonlinearity present in the data. 


The RESET 


The Regression Error Specification Test (RESET) also posits the null hypothesis of 
linearity against a general alternative hypothesis of nonlinearity. If the residuals from 
a linear model are independent, they should not be correlated with the regressors used 
in the estimating equation or with the fitted values. Hence, a regression of the residuals 
on these values should not be statistically significant. To perform the RESET: 


STEP 1: Estimate the best-fitting linear model. Let {e,} be the residuals from the 
model and denote the fitted values by ĵ,. 
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STEP 2: Select a value of H (usually 3 or 4) and estimate the regression equation: 
H 
e, = 6% + > a,” for H > 2. 
h=2 


where z, is the vector that contains the variables included in the model esti- 
mated in Step 1. For example, if you estimate an ARMA(p, q) model, z, will 
include a constant, y,_; through y,_,, and €,_, through €,_,. Note that the 
test can also be applied to a regression model. As such, z, may also include 
exogenous explanatory variables. 


This regression should have little explanatory power if the model is truly linear. 
As such, the sample value of F should be small. Hence, you can reject linearity if 
the sample value of the F-statistic for the null hypothesis a, = --- = ay, = 0 exceeds 
the critical value from a standard F-table. The RESET is easy to implement, does not 
require the estimation of a large number of parameters, and has reasonable power to 
detect some types of nonlinearities. However, since the test uses integer powers of the 
fitted values, it has little power to detect asymmetric models (such as the threshold 
model shown in Panel (a) of Figure 7.1). 


Other Portmanteau Tests 


Portmanteau tests (derived from “a suitcase with multiple compartments”) usually 
refer to residual-based tests that do not have a specific alternative hypothesis. The 
Ljung—Box Q-statistics are a good example of this type of catch-all test. Similarly, 
the popular Brock, Dechert, Scheinkman, and LaBarron (1996) test, called the BDS 
test, is a portmanteau test for independence. In essence, the test examines the distance 
between different pairs of residuals. Let d represent a given distance and let £, and 
€,_, be two realizations of the {€,} sequence. If all values of {€,} are independent, 
then the probability that the distance between any pair of residuals (€;, €;) is less than d 
should be the same for all i and j. Although very popular, the BDS test is able to detect 
serial correlation, parameter instability, neglected nonlinearity, structural breaks and 
other misspecification problems. Hence, rejecting the null hypothesis of independence 
does little to help identify the nature of the problem. Also be aware that the BDS test 
does not have especially good small-sample performance unless you bootstrap the 
critical values. 

The point is that the McLeod-—Li Test, the RESET, and other portmanteau tests all 
have a very general alternative hypothesis. As such, the tests are helpful in determining 
whether a nonlinear model is appropriate but not in determining the nature of the non- 
linearity. As noted by Clements and Hendry (1998, pp. 168-69), “parameter change 
appears in many guises and can cause significant forecast error when models are used in 
practice.” They also establish that it can be difficult to distinguish model misspecifica- 
tion from the problem of nonconstant parameters. As such, it is worthwhile to examine 
Lagrange multiplier tests for nonlinearity since they have a specific null hypothesis and 
a specific alternative hypothesis. 
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Lagrange Multiplier Tests 


Lagrange multiplier (LM) tests can be used to test for a specific type of nonlinearity. 
Thus, an LM test can help you to select the proper functional form to use in your 
nonlinear estimation. To keep the analysis simple, we will assume that var(e,) = o? is 
constant. Let f( ) be the nonlinear functional form and let a denote the parameters of 
f(). In these circumstances, the LM test can be conducted as follows: 


STEP 1: Estimate the linear portion of the model to get the residuals {e,}. 

STEP 2: Obtain all of the partial derivatives 0f()/oa evaluated under the null hypoth- 
esis of linearity. Typically, these partial derivatives will be nonlinear func- 
tions of the regressors used in Step 1. Estimate the auxiliary regression by 
regressing e, on these partial derivatives. 

STEP 3: The value of TR? has a y? distribution with degrees of freedom equal to the 
number of regressors used in Step 2. If the calculated value of TR? exceeds 
the critical value from a y? table, reject the null hypothesis of linearity and 
accept the alternative. With a small sample, it is standard to use an F-test. 


One benefit of the method is that you need not estimate the nonlinear model itself. 
More importantly, the use of a number of LM tests can help you select the form of the 
nonlinearity. It could be the case, for example, that an LM test rejects the GAR model 
but accepts the BL model. Unfortunately, this is not the typical case. Instead, if the LM 
test accepts (rejects) the GAR model, it islikely to accept (reject) the BL model as well. 
Nevertheless, comparing the prob-values of the two can be helpful. Consider the two 
examples below. 


Two Examples 


Example 1 Suppose you want to determine whether {y,} has the specific GAR 
form: 
Yi = My + AYp_1 + OQVj-2 + O3Y,-1Vp-2 + E; (7.9) 


Of course, it would be straightforward to estimate (7.9) directly and obtain the 
t-statistic for the null hypothesis a; = 0. However, the point of this section is to illus- 
trate the appropriate use of the LM test. Toward this end, estimate the sequence an 
AR(2) process and obtain the residuals {e,}. Now, you need to find the partial deriva- 
tives of the nonlinear functional form. It should be clear that 


dy, /0a =l; dy,/da, = Vi-1> dy,/ ða, = Y,-23 and dy,/ 0a = yee 


Hence, Step 2 indicates that you regress e, on a constant (i.e., a vector of 1’s), y,_1, 
y,-2 and y,_1y,_7. Thus, the auxiliary regression is 


e, = Ay TA, Yz_-1 F AzYj;_-2 F 43Y,-1Yp-2 F V; (7.10) 


Obtain the sample value of TR’. If this value exceeds the critical value of y? with 
four degrees of freedom, reject the null hypothesis of linearity and accept the alternative 
of the GAR model. Alternatively, you can perform an F-test for the joint hypothesis 
dy = a; = a), =a, =Q. 
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Example 2 A similar procedure can be used to determine whether {y,} has the 
BL form: 


Yi = Qo + V1 F AQY,_2 F A3Et1Yt-2 F Er- 


Again, estimate the sequence an AR(2) process and obtain the residuals {e,}. The 
desired partial derivatives are 


dy,/0aq = 1; dy,/da, = y,_13 0Y,/Aay = y,_ and dy,/da3 = €,_1Y, 
so that auxiliary regression is 


e, = do + AY,_| + GnY;-2 + 43; Yp-2 + Vy (7.11) 


Since the actual values of {£,_; } are unobserved, use the estimated residuals to form 
€,-1);-2 in (7.11). If the sample value of TR? exceeds the critical value of y° with four 
degrees of freedom, reject the null hypothesis of linearity and accept the alternative of 
the BL model. Alternatively, you can use an F-test for the null hypothesis aj = a, = 
dy = a, = 0. 

Notice that (7.10) and (7.11) are very similar. Since €,_, will be highly correlated 
with y,_,, the values of TR? from the two equations will be quite similar. Hence, the 
results of both tests should be quite similar; if (7.10) indicates that a GAR model is 
appropriate, (7.11) should indicate that a BL model is appropriate. Nevertheless, the 
two tests can be useful. If both accept the null hypothesis of linearity, you can be rea- 
sonably confident that the AR(2) model is adequate. If both reject the null hypothesis, 
you can be somewhat confident that a nonlinear model is appropriate. However, unless 
the prob-values of the two tests are quite different, the tests will not provide much 
guidance as to which nonlinear form is the most appropriate. 


Inference with Unidentified Nuisance Parameters 


You might think it appropriate to estimate a nonlinear model and impose some set 
of parameter restrictions that allow you to test whether the model is actually linear. 
However, inference in nonlinear models is often difficult because of what is called 
the “unidentified nuisance parameter problem” or the “Davies problem.” The problem 
arises when one (or more) of the parameters of the model is not identified when the null 
hypothesis is true. The difficulty raised by the Davies problem is that it is not appropri- 
ate to conduct inference using standard t-tests, F-tests, or y7-tests involving parameters 
that are unidentified. To better understand when the problem arises, consider the fol- 
lowing three examples. 

Example 1 Consider the nonlinear model y, = aj + ax," + €,. Suppose you 
estimate the model using nonlinear least squares (NLLS) and want to test whether 
a = 0. It should be clear that under the null hypothesis a, = 0, the values of a and a, 
are unidentified because the model degenerates into y, = a + a, + €,. If, for example, 
this sum equals 5, any two values of a, and a, are satisfactory as long as aj = 5 — a. 
Hence, the individual values of a, and a, are not identified under the null hypothesis 
a = 0. Similar remarks hold for the test a, = 0. Under the null hypothesis a, = 0, any 
value of æ, is satisfactory. In essence, the likelihood function is invariant to the value of 
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a, when a, = 0. Note that the hypothesis ag = 0 does not involve the Davies problem 
since œ, and a, can be identified even if a = 0. 


Example 2 Consider the model with an endogenous break such that y, = ag + 
&1y;-1 + a,D, + £, where D, is a dummy variable such that D, = 1 if t > t* and D, = 0 
otherwise. Given that the break date ¢* is unknown, the value of ¢* needs to be estimated 
along with the other parameters of the model. As such, under the null hypothesis a, = 0, 
t* is an unidentified nuisance parameter. Under the null hypothesis of no break (so that 
æ = 0), t“ is not identified. Clearly, t* can take on any value if a, = 0. 


Example 3 Consider the nonlinear model (called the logistic smooth transition 
autoregressive model) y, = ag + a,/[1 + exp(—yy,_,)] + €,. If y is known, it is pos- 
sible to form the variable 1/[1 + exp(—yy,_,)], estimate the model, and test the null 
hypothesis a, = 0 directly. However, if y is unknown, there is an unidentified nuisance 
parameter under the null hypothesis of linearity. If y = 0, the y, series is a constant 
plus noise: i.e., y, = @ + a,/2 + €,(since exp(0) = 1). Hence, any values of a, and a, 
will be satisfactory so long as they satisfy ag + a,/2. Similarly, the you cannot simply 
test for linearity by testing the null hypothesis a, = 0. If a, = 0, the model becomes 
Yy, = % + £, so that y is not identified in that its value is irrelevant. 


To be a bit formal, consider a 2-parameter model such that the log likelihood func- 
tion can be written solely as a function of the parameters a, and a: 


L(a, az) 


In the standard case, maximizing £ with respect to (w.r.t.) a, and a, leads to the 
unrestricted parameter estimates. Call £,,(a@,, œ) this maximized value. Note that the 
subscript a denotes that £,,(a,, a) is the maximum value under the alternative hypoth- 
esis (i.e., the unrestricted version of the model). Now suppose that we restrict a, = @ 
and then maximize £ w.r.t. to a. Under the null hypothesis, the maximum value of £ 
is £,(@,, a). The subscript n denotes that £,,(a@,, a) is the maximum value under the 
null æ; = a@,. Now we can perform a likelihood ratio test by forming r as follows: 


r=2[L, (a1, a) —L£,(@,a5)] 


If the null hypothesis is true, the estimate of a, should be a, and the estimate of a, 
should be the same under the null and under the alternative. Hence, £,,(a@,, a2) should 
equal £,(@,, a) so that the expected value of r is zero. Given the usual regularity 
conditions, the large sample distribution of r is y? with 1 degree of freedom (NOTE: 
In the general case, the degrees of freedom will equal the number of restrictions). 

Now suppose that a, is not identified under the null hypothesis. In other words, 
suppose 0L£,,/da, = 0 for all values of ay. In essence, under the null hypothesis, a, 
does not affect the likelihood function so that the value of r becomes: 


r = [Lai a) — £,(@,)] 


Now, even if the null hypothesis is true, there is no reason to believe that 
L(a, &2) = £,(a,). Even if a, is estimated without any error, so a, = @, the 
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difference between £,(a,, a) and £,(a@,) will depend on the value of a. In general 
the expected value of r will not equal zero so that r does not have a standard y? 
distribution with one degree of freedom. In fact, the distribution of r is non-standard 
and depends on the unknown value of a. Since the distribution r is unknown, it is not 
possible to conduct inference on the parameter a, in a standard way. 

In order to circumvent the problem, Davies (1987) proposed using a supremum 
test. Since the distribution of r depends on ay, the idea is to develop critical values 
using the value of a, that that makes it most difficult to reject the null hypothesis. If 
you are able to reject the null hypothesis using these critical values, you can reject 
the null regardless of the actual parameter value of a,. Of course, this is a very 
conservative method in that you will have very large critical values (and confidence 
intervals) —as such, supremum tests usually have relatively low power. In actuality, it 
is not too difficult to develop such critical values. After all, the value of a, that makes 
L(a1,a,) — £,(@,) as large as possible (so that it becomes difficult to reject the null 
hypothesis with the actual data) is the one providing the best fit. Hence, one method 
to develop critical values for a supremum test entails the following Monte Carlo 
method: 


1. Generate the {y,} series under the null hypothesis. (You can bootstrap the 
critical values if you create {y,} using the actual regression residuals from the 
linear model). 

2. Estimate the best-fitting nonlinear model. 

3. Obtain the t-statistic (F-statistic or v7-statistic) for the null hypothesis that 
the coefficient(s) in question equals zero. 

4. Repeat this process many times so as to obtain the distribution of the rele- 
vant test statistic. For a given prob-value—say 5% in a two tailed t-test — the 
sample t-statistic can be compared to the 0.025 and 97.5 percentiles of the 
generated f-statistics. 


Alternatively, for step 2, you can use an LM test in which you estimate the model 
under the null hypothesis of linearity. Use the residuals from this model to conduct the 
type of LM test discussed in the previous section. Repeating this process many times 
yields the relevant test statistic. 

Before moving on, take another little quiz by trying to determine whether there are 
unidentified nuisance parameters in (7.4) and (7.5). The answer is that (7.4) and (7.5) 
do not contain unidentified nuisance parameters. Even if all a,x in (7.1) or all cj; in 
(7.5) equal zero, the remaining parameters in the model are identified. 


4. THRESHOLD AUTOREGRESSIVE MODELS 


A regime switching model allows the behavior of {y,} to depend on the state of the 
system. In a recession, the unemployment rate is likely to rise sharply and then slowly 
decline to its long-run value. However, the unemployment rate does not fall sharply in 
an economic expansion. As such, the dynamic adjustment equation for the unemploy- 
ment rate depends on whether the economy is in an expansionary state (or regime) or 
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in a recession. When the economy changes from an expansionary regime to a contrac- 
tionary regime, the dynamic adjustment of the unemployment rate is likely to change. 
In other circumstances, regime switches might be due to the magnitude of the vari- 
able of interest, the result of an election that changes the behavior of policymakers, or 
may be completely unobservable. As you might expect, a number of regime switching 
models have been developed to analyze these types of regime changes. 

Before proceeding, you need to know that most regime switching models can be 
quite difficult to estimate. Although many software packages allow you to estimate a 
linear model by appropriately clicking on a menu, this is not true for many nonlinear 
models. In general, you need to use a statistical package that has its own programming 
language if you want to estimate a regime switching model. Threshold autoregressive 
(TAR) models of the type developed by Tong (1983, 1990) can be estimated using 
OLS. Another type of threshold model allows for gradual regime change. Such smooth 
transition autoregressive (STAR) models can be estimated using nonlinear least squares 
or maximum likelihood methods. Other nonlinear models, such as the artificial neural 
network and Markov switching models, require methods that are more sophisticated. As 
such, discussion in the text emphasizes threshold and STAR models. The Programming 
Manual that accompanies this text has a number of examples of nonlinear estimation. 


The Basic Threshold Model 


Panel a of Figure 7.1 illustrates a simple TAR process. Recall that the degree of per- 
sistence is a, when y,_, > 0 and a, when y,_, < 0. If we augment the model with a 
disturbance term, the behavior of the {y,} sequence can be represented by 


y= aY + Eir if y1 >0 a 12) 
t7 , ; 
aY + Ex İf Y1 <0 


You can think of the equation y,_; = 0 as being a threshold. On one side of the 
threshold, the {y,} sequence is governed by one autoregressive process and on the 
other side of the threshold, there is a different autoregressive process. Although {y,} 
is linear in each regime, the possibility of regime switching means that the entire {y,} 
sequence is nonlinear. Shocks to {€,,} or {€,} are responsible for regime switching. If, 
for example, y,_; > 0, the subsequent values of the sequence will tend to decay toward 
zero at the rate a,. However, a negative realization of €,, can cause y, to fall by such 
an extent that it lies below the threshold. In the negative regime, the behavior of the 
process is governed by y, = dyy,_, + Ex. AS you can infer, the larger the variance of 
{€,,}, the more likely is a switch from one regime to the other. 

Another common variant of the TAR model is to assume that the variances of the 
two error terms are equal [i.e., var(€,,) = var(€,)]. In this circumstance, (7.12) can be 
written as 

Yi = hy, + a(l — L))yy-1 + E, (7.13) 


where J, = 1 if y,_, > Oand/,=Oify,, <0. 
In equation (7.13), Z, is an indicator function, or dummy variable, that takes on 
the value of 1 if y,_; is above the threshold and a value of 0 otherwise. When y,_; > 0, 
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I, = 1 and (1 — Z) = 0, so that (7.13) is equivalent to a,y,_; +€,. Wheny,_; < 0,/,=0 
and (1 — J,) = 1, so that (7.13) is equivalent to ayy,_, + €,. Figure 7.2 provides a visual 
comparison of the AR, GAR, BL, and TAR models. A series of 200 random numbers 
were drawn from a standardized normal distribution so as to simulate the {€,} sequence. 
The initial value y, was set equal to €, and the next 199 values of {y,} were created 
according to the formula 

Yı = OT y,_1 +E; 


Panel (a) of Figure 7.2 shows the time path of this simulated AR(1) process. Notice 
that the series fluctuates around a mean of zero. Although it may not be possible to 
discern with visual inspection alone, the degree of autoregressive decay is always the 
same; on average, 70% of the current value of y, persists into the next period. Next, the 
same random numbers were used to generate the GAR process: 


y, = 0.7y,_) -= 0.06y~_, +E, 


or 
y, = [0.7 — 0.06y,_ 1 ]y,-1 + &- 


The nature of this particular GAR process is that it behaves as an AR(1) process 
with a random coefficient. The greater the value of y,_,, the smaller is the autoregressive 
coefficient. For values of y,_, = —2, 0, and 2, the degrees of autoregressive persistence 
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FIGURE 7.2 Comparison of Linear and Nonlinear Processes 
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are 0.82, 0.7, and 0.58, respectively. This pattern can be seen in Panel (b) of Figure 7.2 
because negative values of the simulated GAR process are far more persistent than 
positive values. Compare Panels (a) and (b) and note the values of the two series sur- 
rounding period 35 and period 85. You can clearly see that the GAR series returns to 
zero more slowly than the AR series. 

The identical random numbers were used to construct the BL sequence shown in 
Panel (c). After initializing y, = €,, the remaining values of the sequence were gener- 
ated from 

Yi = 0.7y 1 — O.3y,1€;-1 +E; 


or 
y, = [0.7 — 0.3E,_1]Y;-1 + £; 


In the BL model, the degree of persistence depends on the value of €,_,; the larger 
is €,_,, the smaller the degree of persistence. In fact, for those periods in which €,_; < 
—1.0, the sequence behaves like an explosive process (since the value of 0.7 — 0.3€,_; 
exceeds unity). In Panel (c), you can see the extreme movements in the BL process if 
you examine the time intervals that surround period 55 and period 165. Nevertheless, 
the successive values of £, are more likely to exceed —1, so the BL process does not 
continue its decline. 

Panel (d) of Figure 7.2 illustrates the time path of the TAR process 


y, = 0.3L,y,-; + 0.70 — Ly, + & 


where J, = 1 if y,_, > 0 and J, = 0 otherwise. 

When y,_; < 0, this TAR process behaves exactly like the AR(1) process shown 
in Panel (a). Thus, the lower portions of Panels (a) and (d) are nearly identical. How- 
ever, for the TAR process, only 30% of the current value of y, tends to persist into the 
subsequent period when y,_, > 0. Hence, in contrast to the AR(1) process of Panel 1, 
the TAR process displays a substantial degree of mean reversion whenever y,_, > 0. 


Estimation 


Estimation of a threshold model in the form of (7.13) can be performed by simple 
OLS. First construct the dummy variable J, such that J, = 1 if y,, > 0 and J, = 0 if 
y,;-1 < 0. Then construct two variables, say yr, and y}, such that equal to os = 
Iy,;-; andy, = (1 — /,)y,_1. Finally, use OLS to estimate the regression equation y, 
ay yr , + ay,_, +€ It is straightforward to generalize the method such that there is 
a higher-order autoregressive process in each regime. For example, a more general 
version of (7.13) is 


p r 
y= 1, fev T > an + =) fe» an vy "7 te, (7.14) 


i=l i=l 


where Z, = 1 if y; > ct and Z, = 0 if y1 < T. 

In (7.14), there are two separate regimes defined by the value of y,_,;. When 
y,-1 > 7T, 1, = 1, and (1 —J,) = 0, so that (7.14) is equivalent to @j9 + @,y,) +: + 
Q1pYt-p + E When y,_; < T, J, = 0, and (1 —J,) = 1, so that (7.14) is equivalent to 
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Q0 + Ay, Y,-1 H't + AY- + €, Unlike the TAR models depicted in Figures 7.1 
and 7.2, the value of the threshold rt is allowed to differ from zero. Moreover, the 
particular phase diagram shown in the figure was a special type of TAR model in that 
it is continuous. The specification in equation (7.14) allows the two segments of the 
phase diagram to be discontinuous at the threshold. If t is known, the estimation of 
a TAR model is straightforward. Create the dummy variable 7, according to whether 
y,-1 is above or below the threshold r and form the variables J,y,_; and (1 — I,)y,_;- 
You can then estimate the equation using OLS. 

To use a specific example, suppose that the first seven observations of a time-series 


are: 

t 1 2 3 4 5 6 7 
y, 0.5 0.3 —0.2 0.0 —0.5 0.4 0.6 
Yı NA 0.5 0.3 —0.2 0.0 —0.5 0.4 


If the threshold t = 0, you should be able to verify that the time path of the indi- 
cator function 7, and the values of [,y,_), [,y,;-2, (1 — [,)y,_, and (1 — J,)y,_> are those 
shown in Table 7.1. 

To estimate a model with two lags in each regime, you estimate the six values of 
a,; from the regression equation 


Yi = Aol +L Yp-1 + lY -2 + aC + 1) + a = 1Y- 
+ æ (1 = 1D)Y-2 + Er. 


Hence, when y,_, > 0, Z, = 1 and (1 — J,) = 0, so that 
Yi = Qio Fy 1Vp-1 F 2-2 F Er 
Similarly, when y,_; < 0, Z, = 0 and (1 — J,) = 1, so that 
Yi = A20 + A711 + A2212 + E; 


The estimation is only a bit more complicated if you want to allow the variances 
of the error terms to differ across regimes. A more general version of equation (7.14) 


Table 7.1 A TAR Model with Regime Dependent Variances 


t 1 2 3 4 5 6 7 


A 0.5 0.3 -0.2 0.0 -0.5 0.4 0.6 
Voy NA 0.5 0.3 -0.2 0.0 -0.5 0.4 
Yiz NA NA 0.5 0.3 -0.2 0.0 -0.5 
l; NA 1 1 0.0 0.0 0.0 1 
Lesa NA 0.5 0.3 0.0 0.0 0.0 1 
GESTAAR NA 0.0 0.0 -0.2 0.0 -0.5 0 
ae NA NA 0.5 0.0 0.0 0.0 -0.5 


= DY NA NA 0.0 0.3 -0.3 0.0 0.0 
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is the two-regime TAR model: 


= a FAY FO Apip Fey iE Y >T (7.15) 


Ang + A Yp-y +++ +. g,Yp_- + Ep, If Yy ST 


If t is known, you can separate the observations according to whether y,_, is above 
or below the threshold. Each segment of (7.15) can then be estimated using OLS. The 
lag lengths p and r can be determined as in an AR model. Hence, you can determine the 
lag lengths using t-tests on the individual coefficients, F-tests on groups of coefficients, 
or the AIC and/or SBC. 

For example, for t = 0, sort the observations into two groups according to whether 
y,-1 18 greater than or less than zero. Since values when y,_; = 0 are included with those 
when y,_; < 0, the two regimes using the seven sample observations listed above would 
look like this: 


Positive Negative 
Y, Visi Yı Yı 
0.3 0.5 0.0 —0.2 
—0.2 03 —0.5 0.0 
0.6 0.4 0.4 —0.5 


The two separate AR(1) processes can be estimated for each regime. For each 
(Y, Yı—1) pair, the first regression would use (0.3, 0.5), (—0.2, 0.3), and (0.6, 0.4) and 
the second regression would use (0.0, —0.2), (—0.5, 0.0), and (0.4, —0.5). It is only a bit 
more complicated to estimate an AR(2) model for each regime. For an AR(2), the first 
regression would use the (y,, y,_;);-2) values (—0.2, 0.3, 0.5) and (0.6, 0.4, —0.5) and 
the second regression would use (0.0, —0.2, 0.3), (—0.5, 0.0 — 0.2), and (0.4, —0.5, 0.0). 

Regardless of whether you restrict the residual variances to be equal, OLS gives 
consistent estimates of the intercept and slope coefficients conditional on the threshold 
being correct. 


Unknown Threshold 


In most instances, the value of the threshold is unknown and must be estimated along 
with the other parameters of the TAR model. Fortunately, Chan (1993) shows how 
to obtain a super-consistent estimate of the threshold t. To best explain the logic of 
the procedure, consider the TAR series shown in Figure 7.3. If the threshold is to be 
meaningful, the series must actually cross the threshold. It would be nonsense to use a 
threshold of four to estimate the TAR model since the series never crosses that thresh- 
old. Thus, t must lie between the maximum and minimum values of the series. In 
practice, the highest and lowest 15% of the values are excluded from the search to 
ensure an adequate number of observations on each side of the threshold. Your esti- 
mates will be very imprecise if, for example, one regime has only twenty observations. 
If you have a very large number of observations, you may want to exclude only the 
highest and lowest 10% of the observations as potential thresholds. 
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—e— TAR series ----Upper 15% _ ---- Lower 15% 


FIGURE 7.3 Estimation of the Threshold 


In the example at hand, t should lie within the band containing the middle 70% 
of the observations. Each data point within the band has the potential to be the thresh- 
old. Thus, try a value of t = y; (i.e., the first observation in the band) and estimate an 
equation in the form of (7.14) or (7.15). As you can see in the figure, y, lies outside 
the band. Hence, there is no need to estimate a regression using T = y2. Next, esti- 
mate TAR models using t = y, and t = y4 since these two values lie within the band. 
Continue in this fashion for each observation within the band. With 200 observations, 
there should be about 141 estimates of the TAR model. The regression containing the 
smallest residual sum of squares contains the consistent estimate of the threshold. 

Now you can see why you need to use a software package that contains a program- 
ming language. Instead of estimating the 141 equations one at a time, as illustrated in 
the Programming Manual, you could embed the estimations within a Do-End loop or 
a For-Next loop. 

Rothman’s (1998) TAR estimate of the U.S. unemployment rate tells an interesting 
story. His two-regime model in the form of (7.15) is 


u, = 0.0529 + 1.349u,_, — 0.665u,_> + &, if u,_, > 0.062 
(3.46) (16.03)  (=9.37) 


u, = 1.646u,_, — 0.733u,. + Ex if u, < 0.062 
(14.27) (—6.37) 


There is a high-unemployment and a low-unemployment regime separated by 
the value u,_,; = 0.062. Rothman notes that unemployment is more persistent in the 
high-unemployment regime than the low-unemployment regime in that shocks that 
increase unemployment do not decay to zero. The variance ratio for the TAR model 
is 0.942. As measured by the residual sum of squares, the TAR model fits the data 
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better than the AR(2) and GAR models, but not as well as the BL model. Notice that 
the estimated AR(2), GAR, BL, and TAR models contain 2, 3, 3, and 6 parameters 
(remember that t is an estimated parameter in the TAR model). As such, a different 
pattern emerges if AIC is used to select the most appropriate model. The BL model 
has the lowest value of the AIC followed by the AR, GAR, and TAR models. Based 
on the AIC, most applied econometricians would discard the TAR model. 


5. EXTENSIONS OF THE TAR MODEL 


Note that there is something very different about the TAR model versus the GAR and 
BL models. The latter two are designed to be useful then the functional form of the 
nonlinear process is unknown. It is not surprising that some researchers view specifi- 
cations based on a Taylor series approximation as being somewhat ad hoc. In contrast, 
the TAR model posits a type of adjustment mechanism that corresponds to the state 
of the economic system. This has led to a growing popularity of TAR models and a 
number of interesting extensions. 


Selecting the Delay Parameter 


In the TAR models considered thus far, the regime is determined by the value of y,_,. 
However, it might be that the timing of the adjustment process is such that it takes more 
than one period for the regime switch to occur. In such circumstances, we could allow 
the regime switch to occur according to the value of y,_; where d = 1, 2,3, ... Thus, 
the system would be in regime 1 if y,_; > T and in regime 2 if y,_, < T. 

There are several procedures available to select the value of the delay parameter d. 
The standard procedure is to estimate a TAR model for each potential value of d. The 
one with the smallest value of the residual sum of squares yields a consistent estimate 
of the delay parameter. Alternatively, you can choose the delay parameter that leads to 
the smallest value of the AIC or the SBC. This second approach is most useful when 
the optimal values for p and r (i.e., the lag lengths in the various regimes) depend 
on the choice of d. 


Multiple Regimes 


In some instances, it may be reasonable to assume that there are more than two regimes. 
For example, if we assume that the variance of shocks does not differ across regimes, 
we can write (7.1)—the TAR model of the interest rate spread—in the form: 


Sta, (s,)-5) +e, whens,,>5 
=< 7 z 
: S+a,(s,,—S)+e€, whens, <s 


Now suppose that there is a transaction cost c that prevents complete adjustment 
of the spread to s. If the gap between s,_, and s is less than the cost of undertaking the 
transaction, it would not be profitable to switch funds between the securities. As such, 
there may be a neutral band within which the spread may fluctuate. Within this band, 
there are no economic incentives to act in a way that equates the spread with s. Outside 
of the band, however, there may be strong incentives for individuals to act in a way that 
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drives the spread toward s. A simple way to model this behavior is with the band-TAR 
model: 


S,=S+a,(s,_) —S) +, when sı > S+c 
S, = 5,4 +E; when s— c < sı SSC 
S,=S+a(s,_) —S) +, when sı <S—c 


For this specification, there tends to be no tendency for mean reversion unless s,_; 
lies outside of the neutral band formed by adding and subtracting the transaction cost 
from the long-run value of the spread. Hence, inside the band, the behavior of the spread 
is arandom walk. Balke and Fomby (1997) use this type of band threshold process to 
estimate a model of the term structure of interest rates. 

In a more general multiple regime model, each regime can be represented by a 
distinct autoregressive process. As discussed in the next section, graphical techniques 
can be used to detect the presence of multiple thresholds. 


More on Estimating the Threshold 


The discussion in Section 4 gave an overview of Chan’s (1993) method of finding the 
consistent estimate of the threshold. However, there are some graphical techniques that 
can be helpful in fine-tuning the estimate. The general point is that we can think of 
the sum of squared residuals from any TAR model as being a function of the threshold 
value used in the estimation, i.e., ssr = ssr(r). The closer we come to the true thresh- 
old value 7r, the smaller should be the sum of squared residuals. Hence, ssr should be 
minimized at the true value of the threshold. Moreover, the sum of squared residuals 
will have several distinct local minima if there are several thresholds. This suggests the 
following method to detect the thresholds: 


STEP 1: Sort the threshold variable (i.e., sort y,_,) from the lowest to the highest 
value. Let y’ denoted the i-th value of the sorted series. Hence, in a sample 
with T observations, y! is the smallest value of y,_, and y" is largest value. 

STEP 2: Estimate a TAR model in the form of (7.14) or (7.15) using the successive 
values of {y'} as thresholds. Save the sum of squared residuals associated 
with each model. Since you want to maintain 15% of the observations on 
each side of the threshold, use only the middle 70% of the values of y’. For 
example, if you have 200 observations, estimate 141 TAR models beginning 
with t = y% and ending with t = y!’°. When you are done, you will have 
141 values of the sum of squared residuals. 

STEP 3: Create a graph of the successive values of the sum of squared residuals. If 
ssr(30) is the sum of squared residuals using t = y0 and ssr(170) is the 
sum of squared residuals using t = y!”°, plot the values of ssr(30) through 
ssr(170). 


In the absence of threshold behavior, there should be no clear relationship between 
the sum of squared residuals and the potential thresholds. However, if there is a single 
threshold, there should be a single trough in the graph you create in Step 3. For example, 
if there is a distinct trough at ssr(132), the consistent estimate of the threshold is y!*?. 
After all, 7 = 2 results in the TAR model with the best fit. If there are two troughs, 
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there are two potential thresholds. To explain in a bit more detail, consider the model 
that was used to generate the 200 values shown in Figure 7.3: 


y, = 0.3Ly 1 + 0.70 — Ly, + & 


where J, = 1 if y,_, > 0 and J, = 0 otherwise. 

Figure 7.4 reproduces the same numerical values sorted from low to high. As 
shown in both figures, the first value lying within the 70% band (i.e., y°°) is —1.623. 
Hence, the first estimate of the TAR model uses rt = —1.623. The second estimation 
uses the next sorted observation as the threshold. This second value happens to be 
—1.601; thus, the second estimation uses r = — 1.601. Continuing in this fashion brings 
us closer to the true threshold value of zero. As such, the fit of the TAR model should 
continue to improve as we move from threshold values of — 1.623 towards t = 0. How- 
ever, once we cross the true threshold and use values of r that are greater than zero, 
the sum of squared residuals should begin to increase. As such, the plot of the residual 
sum of squares should reach a minimum at t = 0. If you examine Figure 7.4, you can 
see that t = 0 corresponds to y!3*. You can reproduce these results using the data on 
the file SIM_TAR.XLS; respectively, the second and third columns of the file contain 
the 200 values of the simulated series along with their sorted values. 

What would happen if the true model contained two thresholds? In particular, 
suppose one threshold is —1 and the other is zero. As you can see from Figure 7.4, the 
value of y5% = —1 and y!*? = 0. Now consider the idealized plot of the residual sum of 
squares shown in Figure 7.5. As we estimate TAR models beginning with y? and pro- 
ceed toward y>», the sum of squared residuals declines. Also depicted is the fact that the 
sum of squared residuals should begin to increase as we use threshold values in excess 
of —1. This increase continues until you near the second threshold. In the example at 
hand, the sum of squared residuals begins another decline as we approach the second 
threshold of 0. The second trough at ordered observation 132 indicates the second 
threshold t = 0. In order to estimate a two threshold model, many researchers would 
simply use the trough values as shown in Figure 7.5. In practice, there is a degree of 
subjectivity since the troughs might not be as distinct as those shown in the figure. 


Thresholds 


— Ordered ----Lower 15% _ ---- Upper 15% 


FIGURE 7.4 Ordered Threshold Values 
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FIGURE 7.5 Values of Sum of Squared Residuals 


What might appear to be a trough to one researcher might appear to be a small decline 
to another. 


Threshold Regression Models 


It has also become popular to use a threshold in the context of a traditional regression 
model. Consider the following specification 


Y, = dy + (a, + Di) x, + E£, 


where J, = 1 if y,_, > T and J, = 0 otherwise. 

Here, a, measures the effect of x, on y, when y,_, < T. However, when y,_4 > T, 
the effect of x, on y, is a, + b,. Hence, if a, and b, are positive, changes in x, have 
a greater effect on y,, when y,_, > T than when y,_, < T. You can estimate the value 
of the threshold using Chan’s method described above. For example, Shen and Hakes 
(1995) estimate a nonlinear reaction function for the central band of Taiwan. The idea is 
that the central bank will respond differently to changes in economic variables in a high 
inflationary environment than in a low inflationary environment. Similarly, Galbraith 
(1996) shows that for Canada and the United States, the effect of money on output 
depends on whether credit conditions are already tight or loose. 

There is no requirement that the threshold variable be given by y,_,. For example, 
the threshold variable can be x,_,, where the delay parameter is any nonnegative integer. 
The threshold variable can even be a variable that does not appear directly in the regres- 
sion equation. Two examples of threshold regression models are provided in Section 6. 


Pretesting for a TAR Model 


A Lagrange multiplier test cannot be used for a threshold model since it is not differ- 
entiable. For example, suppose you have TAR model in the form 


yy = I (a1 + æi )V-1 + (l = 1, )(Q9 + 1 V;-1) FE; (7.16) 


where J, = 1 of yı > T and Z, = 0 if y1 <T. 
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It should be clear that the model is not differentiable at r. For example, the 
derivative dy,/da,, is discontinuous at t in that dy,/da,, = y,; when y,_; > T and 
dy,/da,,; = 0 when y,_; < T. Nevertheless, the appropriate test for threshold behavior 
is straightforward if the threshold value is known. Under the null hypothesis of 
linearity, (7.16) is the AR(1) process 


Ve = Qio + Oy Vp-1 F Ex 

As such, it is possible to estimate (7.16) and use a standard F-test to determine 
whether a9 = a9 and a, = @. However, if the threshold is unknown, another 
method must be used since you have searched over all values of t to estimate the 
values of @ 9, @11, @9, and a . You need to account for the fact that the search for all 
potential values of r makes the fit of the regression as good as possible. Hence, the 
sample value of F will be overly large. 

To make the point in a different fashion, under the null hypothesis of linearity, 
there is an unidentified nuisance parameter. Under the null hypothesis that the model is 
linear (i.e., @}; = @>,), the estimate of t can take on any value so that is an unidentified 
nuisance parameter. 

Following Davies (1987), the test for a threshold model can be conducted using a 
supremum test. In fact, Hansen (1997) shows how to appropriately obtain the appro- 
priate critical values using a bootstrapping procedure. Search over all values of t to find 
the best fitting TAR model. Let SSR, denote the unrestricted sum of squared residuals 
from the estimated threshold model. Similarly, let SSR, denote the sum of squared resid- 
uals obtained from restricting the model to be linear. If you have T usable observations, 
a traditional F-statistic could be constructed as 

(SSR, — SSR,,)/n 
~ (SSR, /(T — 2n)) 


where n is the number of parameters estimated in the linear version of the model. In 
the example at hand, n = 1. 

However, this sample value of F cannot be compared to the critical value found in 
a table for F. Instead, to use Hansen’s (1997) bootstrapping method, you need to draw 
T normally distributed random numbers with a mean of zero and a variance of unity; 
let e, denote this set of random numbers. (Alternatively, you can bootstrap the test by 
using random draws of the residuals from the linear model). Estimate the auxiliary by 
regressing e, on a constant and y,_, call the sum of squared residuals SSR*. Also for 
each potential threshold, regress e, on J,, (1 —J,), Ly;_, and (1 — 7,)yž_ and use the 
regression providing the best fit. Call the sum of squared residuals from this supremum 
regression SSR*. Use the two sums of squares to form 

(SSR* — SSR*)/n 
~ (SSRi/(T = 2n)) 

Repeat this process several thousand times to obtain the distribution of F*. If the 
value of F from your sample exceeds the 95th percentile for F*, you can reject the null 
hypothesis of linearity at the 5% significance level. 

The method generalizes to testing the null hypothesis a linear model against the 
alternative of (7.14). Create SSR,, by estimating (7.14) and SSR, by estimating the linear 
model that constrains all values of a); = @;. Obtain SSR* by regressing e, on all of the 


ok 
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regressors in the linear model and SSR* by regressing e, on all of the regressors in 
(7.14). In obtaining SSR}, be sure to search for the best fitting threshold. After several 
thousand replications, you should have a good approximation to the distribution of F*. 
A number of software packages can readily perform such a test. A detailed example of 


the testing procedure is provided immediately below. 


TAR Models and Endogenous Breaks 


If you have been paying careful attention, you might have recognized that the thresh- 
old model is equivalent to a model with a structural break. The only difference is 
that in a model with structural breaks, time is the threshold variable. In Chapter 2 
(see Figure 2.10 and the file Y_BREAKS.XLS), we analyzed the simulated series 
y, = 1+0.5y,) + £, for 1 < t < 100 and y, = 2.5 + 0.65y,_; + £, for 101 < ż < 150. 
When we treated the break date as known, we were able to form the dummy variable 
D, and the variable D,y,_, and estimate: 


y, = 1.6015 + 0.2545y,_; — 0.2244D, + 0.5433D,y,_, 
(7.22) (2.76) (-0.39) (4.47) 


where D, = | if £< 101 and D, = 0 otherwise. Since the coefficient on D,y,_, was 
highly significant, we were able to verify the presence of a break in the series. Of 
course, this model of a breaking series is equivalent to the threshold form 


y, = (1.3771 + 0.7977y, MI, + (1 -1)(1.6015 + 0.2545y,_,) 
(2.60) (10.10) (7.22) (2.72) 


where J, = 1 if t < 101 and Z, = 0 otherwise. 

If we pretend that the break date is unknown, we can illustrate the use of a supre- 
mum test. It turns out that t = 100 yields the model with the smallest sum of squared 
residuals. The using this value as the threshold, sum of squared residuals is 138.63. If 
you estimate the model under the null hypothesis of linearity, you should find 


(2.64) (22.76) 


The sum of squared residuals is 195.18. Since there are 149 usable observations, 
and 2 extra coefficients in the threshold model, the sample value of the F-statistic is 


_ (195.18 — 138.63)/2 


= 29.57 
138.63/(149 — 4) 


Next, draw a sequence of 150 random numbers with a standard deviation of unity 
to represent the e, series. Since the actual residuals may not be normal, use random 
draws (with replacement) of the residuals from the linear model. Estimate the auxiliary 
equation of the form e, = a + a y,_, + v; Next, for each rin the interval 22 < t < 128, 
create the indicator function 7, and estimate a threshold regression in the form 


e, = aio + @ 1-1) + (A — LCa + O11) +E; 
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Use the best fitting regression to construct the sample F statistic (i.e., construct 
F*) for the null hypothesis a9 = @9 and æj; = @,. Repeat this process several thou- 
sand times to obtain the distribution of F*. Compare your distribution to the value of 
F = 29.75. If you perform this process using the data on the file Y_BREAKS.XLS, 
approximately 95% of the constructed F* values should be below 3.15. As such, the 
null hypothesis of linearity is clearly rejected. 

There is a more general point to be made from this example. Carrasco (2002) shows 
that the usual tests for structural breaks (i.e., those using dummy variables) have little 
power if the data are actually generated by a threshold process. Her observation is that 
the multiplicity of regime changes in a TAR model cannot be adequately captured be the 
dummy variables. However, a test for a threshold process using y,_, as the threshold 
variable has power to detect both threshold behavior and structural change. Even if 
there is a single structural break at time period ż, using y,_, as the threshold variable 
will mimic this type of behavior. After all, if the series suddenly increases at t, values 
of y,_, will tend to be low before date ¢ and high after date t. As such, she recommends 
using the threshold model as a general test for parameter instability. 


6. THREE THRESHOLD MODELS 


Perhaps the best way to understand the nature of threshold models is to consider a few 
specific examples. This section illustrates the estimation of a threshold autoregressive 
model and two threshold regression models. 


The Unemployment Rate 


In addition to Rothman (1998), many papers have indicated that the U.S. unemployment 
rate displays nonlinear behavior. You can follow along the estimation process using the 
data set UNRATE.XLS. Figure 7.6 shows the monthly values of the rate over the period 
January 1960 through June 2013. In November 1982 the rate rose to as high as 10.8% 
although there were also sharp increases in 1970, 1973, 1991, 2001 and 2008. The mean 


Percent 
N 
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FIGURE 7.6 The U.S. Unemployment Rate 
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of the 642 values is 6.10% and the standard deviation is 1.61 percentage points. After 
some experimentation, you can convince yourself that it is reasonable to difference the 
series and estimate: 


Au, = 0.0005 + 0.058Au,_; + 0.228Au,_, + 0.188Au,_3 + 0.140Au,_4 — 0.128Au,_ 15 
(0.09) (1.48) (5.88) (4.87) (3.59) (—3.58) 
(7.17) 
where SSR = 14.883, AIC = 1710.45, SBC = 1737.12 
The first 12 autocorrelations are 


Pı P2 P3 P4 P5 Po P7 Pg Po Pio Pu Pio 
—0.01 -0.02  -0.02 —-0.01 0.04 0.04  -0.01 0.03 0.02 0.00 0.07  -0.02 


Since the intercept and the coefficient on Au,_, are not significant, general practice 
would be to re-estimate the model without these two terms. 

The RESET is not supportive of nonlinearity. Let e, denote the regression residuals 
from (7.17). If we regress the residuals on the regressors and the powers of the fitted 
values, we obtain: 


e, = —0.006 + 1.59Aa? + 10.36AM> — 33.94Aa7 + $ ajAu,-; i = 1,2,3,4, 12 
(—0.64) (1.15) (0.88) (-1.05) 


The F-statistic for the restriction that the coefficients on Ai, Ai, and Ait jointly 
equal zero is 1.42. With three degrees of freedom in the numerator and 620 in the 
denominator, the prob-value is 0.234. Hence, the RESET does not detect the presence 
of nonlinear behavior. Notice that the RESET has a very general alternative hypothesis; 
as such, it does not have power against all types of nonlinearity. In particular, since the 
test employs a smooth polynomial of the fitted values, it does not do especially well in 
capturing asymmetric behavior. 

However, other diagnostic checks indicate a potential problem with the linear spec- 
ification. The McLeod—Li (1983) test is such that 


e? = 0.018 + 0.143e? , + 0.096e , 
(8.68) (3.59) (2.40) 


The sample value of F for the restriction that the coefficient on e 4 ae 


jointly equal to zero is 10.95; this value is highly significant. It is also. interesting that 
other variants of the test suggest nonlinearity. Consider the regression 


e, = —0.0078 + 0.3298¢2 , 


(-1.11) (2.30) Gt) 


Equation (7.18) suggests that a large error (either positive or negative) in the previ- 
ous period is associated with a positive error in the current period. In a linear model, the 
adjustment is symmetric so that the residuals should not be correlated with the lagged 
squared residuals. 

If you set d = 1, and estimate a model in the form of (7.14), you should find that the 
threshold value yielding the smallest residual sum of squares is such that r = 0.070. 
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FIGURE 7.7 SSR and the Potential Thresholds 


Figure 7.7 shows the value of the sum of squared residuals for each threshold value 
considered. You can see the trough in the scatter plot of ssr(T) occurring at t = 0.070. 
Although there is a second trough in the scatter plot near t = 0.025, the two troughs 
are reasonably close together so that it makes sense to ignore the possibility of multiple 
thresholds. 

Also note that other delay parameters do not fare as well as d = 1. For example, the 
residual sums of squares with d = 1, 2, and 3 are 14.296, 14.319 and 14.385, respec- 
tively. (The estimated values of t for d = 2 and 3 are 0.022 and —0.029, respectively). 
Hence, we can be confident that a delay parameter of unity is appropriate. 

If you set d = 1 and t = 0.07, you should find 


Au, = 1,(—0.070 + 0.381Au,_, + 0.345Au,_7 + 0.126Au,_3 + 0.084Au,_4 


(-3.28) (3.84) (5.22) (1.90) (1.25) 
= 0.148Au,_12) + (1 — I)(—0.004 — 0.039Au,_; + 0.122Au,_> 
(-2.08) (-0.47) (0.57) (2.48) 
+ 0.179Au,_; + 0.159Au,_4 — 0.126Au,_5) 
(3.73) (3.35) (—3.09) 


SSR = 14.296, AIC = 1697.12, SBC = 1750.45 


where J, = 1, when Au,_, > 0.07 and J, = 0, when Au,_, < 0.07 

Note that the AIC selects the threshold model while the SBC selects the linear 
model in (7.17). However, the threshold model contains a number of parameters that 
are small relative to their standard errors. Clearly, it makes sense to test for the presence 
of threshold behavior. You might construct the sample F-statistic for the null hypothesis 
of linearity as 


F = [(14.883 — 14.296) /6]/[14.296/(629 — 12)] = 4.22 
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However, as it was necessary to estimate the threshold value, it is not appropriate to 
compare 4.22 to a standard table of F. Instead, if you use Hansen’s (1997) bootstrapping 
method, you should find that it is possible to reject the null hypothesis of linearity at 
the 0.0025 level. Hence, we can conclude there is threshold behavior. Inference on 
the coefficients in a threshold model is not straightforward since it was necessary to 
search for t. The f-statistics yield only an approximation of the actual significance 
levels of the coefficients. The problem is that the coefficients on the various Au,_; are 
multiplied by Z, or (1 — J,) and that these values are dependent on the estimated value 
of t. Nevertheless, both model selection criteria indicate that you can pare down the 
model by eliminating 7,Au,_4, the intercept in the negative regime, and (1 — J,)Au,_,. 
Also note that that the coefficients on J,Au,_,, and (1 — I,)Au,_,,5 are almost identical. 
Thus, it makes sense to simply include Au,_,, in the model. Paring down the model in 
this fashion results in: 


Au, = 1,(—0.069 + 0.387Au,_; + 0.376Au,_5 + 0.130Au,_3) 


(-3.19) (3.88) (6.22) (1.99) 
+ (1-1)(0.155Au,_5 + 0.188Au,_3) — 0.124Au,_15 
(3.21) (3.97) (-3.49) 


AIC = 1700.38 and SBC = 1731.49 


The point estimates are such that there is far more persistence when Au, >T 
than when Au,_,; < T. This result strongly suggests that increases in unemployment 
are far more persistent than decreases in unemployment. As an exercise, try to verify 
these results. You might also find it interesting to estimate the series using the threshold 
value near 0.025. 


Asymmetric Monetary Policy 


Much of the literature concerning the behavior of the Federal Reserve is based on the 
type of feedback rule introduced by Taylor (1993). The so-called Taylor rule has the 
form 


ip = Yo + T, + a (m, — a") + Py, + Yili + Er 
or setting @% = Yọ — axz* anda = 1 + a, we can form 
i, = tan, + Py, + Yi) +E; 


where i, is the nominal federal funds rate, z, is the inflation rate over the last four 
quarters, z* is the target inflation rate, y, is output gap measured as percentage deviation 
of real GDP from its trend, and a, P, yo, Y1 and y, are positive parameters. 

The intuition behind the rule is that the Federal Reserve wants to keep inflation 
at the target level and to stabilize real GDP around its trend. Since high interest rates 
discourage spending, the Taylor rule posits that the Federal Reserve will increase i, 
when inflation is above its target level and when the output gap is positive. The lagged 
value of the interest rate creates some inertia in the system and represents the desire of 
the Federal Reserve to smooth interest rate changes over time. 
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In Bunzel and Enders (2010), we created the data in the file labeled TAYLOR. XLS 
containing the variables necessary to estimate the Taylor rules reported below. Specif- 
ically, the interest rate (i,) is the quarterly average of the monthly values of the federal 
funds rate. The four-quarter inflation rate (z,) is constructed as: 


z, = 100*(Inp, — Inp,_4) 


where p, is the chain-weighted GDP deflator. 

In order to account for the fact that real GDP is often subject to substantial revi- 
sions, it is standard to use the real-time values of GDP available at the Philadelphia Fed- 
eral Reserve Bank’s website. (http://research.stlouisfed.org/fred2). The notion is that 
the Federal Reserve makes decisions using the then-current values of GDP. Revised val- 
ues are only available after a substantial delay. The output gap is obtained by detrending 
the real output data with a Hodrick—Prescott (HP) filter as described in Chapter 4. 
Specifically, beginning with t = 196302, the HP filter is applied to the real-time out- 
put series running from 1947Q1 through t. The filtered series represents the trend values 
of real GDP. Call yÍ the last observation of the filtered series. We construct the output 
gap for time period t (y,) as the percentage difference between real-time output at ¢ and 
the value of yÍ We then increase t by one period and repeat the process. The aim is not 
to ascertain the way that real output evolves over the long-run. Instead, the goal is to 
obtain a reasonable measure of the pressure felt by the Federal Reserve to use monetary 
policy to affect the level of output. 

In applied work it is typical to estimate the Taylor rule over a number of sample 
periods reflecting the fact that a change in the Federal Reserve’s operating procedures 
occurred in 197904, the Volker disinflation ended by 198301, Alan Greenspan became 
Fed Chairman in August 1987, and Ben Bernanke became Chairman in February 2006. 
Consider the estimated model for the 197904-200703 sample period: 


i, = —0.269 + 0.4647, + 0.345y, + 0.810i,_, AIC = 500.75 and SBC = 511.63 
(-1.47) (6.05) (5.16) (21.83) 


The estimated model appears to be reasonable in that the coefficients on infla- 
tion and the output gap are both positive and significant at conventional levels. The 
coefficient on the lagged interest rate (i.e., y; = 0.810) suggests a substantial amount 
of interest rate smoothing. In the long-run, i, responds more than proportionally to 
changes in z,[since 0.464/(1 — 0.810) = 2.44] so that the real interest rate rises (falls) 
when inflation increases (decreases). 

A number of authors have questioned the linear form of the Taylor rule and have 
argued that the Federal Reserve’s reactions to z, and y, are best modeled as a nonlinear 
process. For example, it is likely that the Federal Reserve prefers inflation to be below 
the target than above the target. Moreover, it is probable that that the Federal Reserve 
prefers a positive output gap than a negative one. 

The point is that interest rate changes should be more dramatic when inflation is 
high and/or output is low. As such, it seems natural to estimate the Taylor rule as a 
threshold regression using either the inflation rate or the output gap as the threshold 
variable. Since we do not know the delay factor, we can estimate four threshold regres- 
sions with z,_;, 2,2, Y;-, and y,_ as the threshold variables. For each regression, the 
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consistent estimate of t is obtained using a grid search over all potential thresholds 
using a trimming value of 15%. The estimated threshold value, sum of squared resid- 
uals (SSR), AIC, and SBC for each of the four regressions are 


z SSR AIC BIC 
T 3.527 50.80 455.93 477.67 
T 3.527 50.42 455.08 476.83 
Ya -1.183 63.97 481.75 503.49 
Y2 -1.565 53.41 461.53 483.28 


Notice that all of the threshold regressions have a better fit than the linear model. 
Moreover, if you bootstrap the sample F-statistics, you will find that all are highly sig- 
nificant. Nevertheless, since z,_ provides the best fit, we should use it as the threshold 
variable. As such, the estimated Taylor rule is 


i, = 1.383 + 1.0557, + 0.472y, + 0.374i,_; when z,_) > 3.527 
(3.02) (10.56) (6.25) (5.75) 


and 


i, = —0.440 + 0.227m, + 0.305y, + 0.967i,_; when x,_7 < 3.527 
(-1.39) (1.88) (8.85) (24.98) 


Notice the coefficients on z, and y, are much greater in the high-inflation regime 
than in the low inflation regime. Moreover, the interest rate smoothing coefficient is 
far greater when inflation is low than when inflation is high. In essence, in the high 
inflation regime, the Federal Reserve is far more policy active than in the low inflation 
regime. Also notice that the linear variant of the rule seems to ‘average’ the responses 
of the Federal Reserve across the high and low inflation regimes. 


Capital Stock Adjustment with Multiple Thresholds 


Boetel, Hoffman and Liu (2007) estimate an interesting model that contains three 
regimes. The problem addressed in the paper is that pork producers do not always 
adjust their capital input in the face of changing market conditions. However, there are 
times when even a very small change in market conditions induces a large adjustment 
in the capital stock. Their model asserts that there is a ‘normal’ range for the price 
of hogs and that price changes within this range will induce a sluggish investment 
response. For our purposes, the key variables in the model are 


K,- K, = 4569 + 6360/,, + 63521, + 452pj)-1—- 2684pr1 + + &; 
(3.30) (5.59) (5.20) (1.84) (-3.66) 


where K, is the size of the breeding stock, p,,_; is a measure of the output price of 
hogs, and pr;—ı is a measure of the price of feed. The indicators functions are such 
that ly = il if Dey > Thigh = 1.1185 and l; = —1 if Pai < Tow = 1.1105. The use 
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of lagged values for the dependent variables is designed to reflect a one period delay 
between the time of the investment decision and its realization. 

It should not be surprising that the net acquisition of the breeding stock (K, — K;_1) 
is positively related to the price of hogs and negatively related to the price of feed. An 
appealing feature of the model is that the indicator functions multiply the intercepts but 
not the variable p;,_;. Boetel, Hoffman and Liu (2007) note that allowing all variables 
to have asymmetric effects on K, — K,;_, would entail estimating a large number of 
parameters with a consequent loss of degrees of freedom. 

Notice that the three regimes distinguished by the value of pp; relative to two 
threshold values. When py; is between Thigh and Toy, Zi; and J, = O so that the 
intercept is 4569. Instead, when Py;—1 > Thigh» 41; = 1 the intercept is 10929 and when 
Put—1 < Tlow> Lo; = — 1 the intercept is 8. Thus, there is a high-, sluggish- and disinvest- 
ment regime whose presence is dependent on the value of p;,_;. AS such, it would be 
a mistake to conclude that the slope coefficient 452 measures the full effect of a price 
change on net investment. When the value of p;,,_; crosses one of the thresholds, the 
change in investment is enhanced since the intercept changes along with the price. Also 
note that price changes within the interval Thigh tO Toy, Will little effect on investment. 

Boetel, Hoffman and Liu (2007) use a different method than the one described 
above to estimate the two threshold values appearing in their model. First, they perform 
a grid search to find the single threshold value that provides the smallest value of sum 
of squared residuals. Let 7; denote this threshold value. Next, maintaining the value of 
T,, they estimate a second threshold—say t,— so as to further minimize the residual 
sum of squares. Although Hansen (1999) shows that this second threshold estimate is 
efficient, the first is not since it was estimated in the absence of the second threshold. 
Finally, they fix the value of t, and reestimate the threshold value of q, so as to provide 
the smallest value of the sum of squared residuals. An alternative would have been to 
use the graphical method discussed in Section 5. 


7. SMOOTH TRANSITION MODELS 


For some processes, it may not seem reasonable to assume that the threshold is sharp. 
Instead the speed of adjustment may be the type of nonlinear process shown in Panel 
(b) of Figure 7.1. Smooth transition autoregressive (STAR) models allow the autore- 
gressive parameters to change slowly. Consider the special NLAR model given by: 


Yi = Qo + OY 1 + BY Oi) + Er 
If fO is a smooth continuous function, the autoregressive coefficient (a, + p1) 
will change smoothly along with the value of y,_;. There are two particularly useful 
forms of the STAR model that allow for a varying degree of autoregressive decay. The 
logistic version of the STAR model (called the LSTAR model) generalizes the standard 
autoregressive model such that the autoregressive coefficient is a logistic function: 


Yi = Ay + .OyYp_y H+ + AyY;-p + Olo + Biyi + +++ + BpYt-pl + E: (7.19) 


where 
6 = [1 +exp(-y(y,_) — 0)! (7.20) 
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Note that y is called the smoothness parameter. In the limit, as y > 0 or ow, 
the LSTAR model becomes an AR(p) model since the value of O is constant. 
For intermediate values of y, the degree of autoregressive decay depends on 
the value of y,,. As yı > —œ,0 —> 0 so that the behavior of y, is given by 
Qo HAY Htt + O,y,_, + Er Similarly, as y,_; > +00,@ > 1 so that the behavior 
of y, is given by (a + Po) + (a, + P1)yr-ı +++: +e, Thus, the intercept and the 
autoregressive coefficients smoothly change between these two extremes as the value 
of y,_,; changes. 

The exponential form of the model (ESTAR) uses (7.19), but replaces (7.20) with 


0 =1-exp[-7(,_,-0)"] 7 >0. 


Notice that @ contains a squared term so that the coefficients for the ESTAR model 
are symmetric around y,_, = c. As y,_, approaches c, 0 approaches 0 so that the behav- 
ior of y, is given by aj + a) y,_1 ++++ +@,y;_p + €; AS y;-; moves further from c, 8 
approaches 1 so that the behavior of y, is given by (@ + fy) + (a, + By)y,_-1 +e +E, 
The ESTAR model has proven to be useful for periods surrounding the turning points 
of a series (i.e., periods in which ae will be extreme) in that such periods have differ- 
ent degrees of autoregressive decay than others. Since the ESTAR model is symmetric 
around y,_; = c so that it can approximate gravitational attraction as in Figure 7.1. Also 
note that as y approaches zero or infinity, the model becomes an AR(p) model since 0 
is constant. Otherwise, the model displays nonlinear behavior. 

You can see the difference between the LSTAR and ESTAR models by examining 
Figure 7.8. The top panel constructs 0 = [1 + exp(—y(),_; — c))]7! for c = 0 and values 
of y = | and 2. As y,_; ranges from —5 to +5, the value of 0 ranges from 0 to 1. Note 
that the S-shape of the transition is sharper, the greater is y. For large values of y, the 
adjustment is so sharp that LSTAR model acts as a TAR process. The bottom panel also 
uses c = 0 and values of y = 1 and 2, but constructs the transition function using the 
ESTAR formula 6 = 1 — exp[—y(y,_; — c)*]. You can see that the U-shape becomes 
sharper as y increases. 

Michael, Nobay, and Peel (1997) make the point that transaction costs are an 
important feature of international trade. Such costs may include the purchase of for- 
eign exchange or forward cover, the payment of tariffs and import licensing fees, and 
transportation costs. As in the band—TAR model, small deviations from PPP will not 
be corrected through the process of commodity arbitrage. Larger discrepancies are 
expected to be mean-reverting such that speed of adjustment is an increasing func- 
tion of the size of the discrepancy. The idea is that very large discrepancies are quickly 
eliminated but mid-size discrepancies are eliminated more slowly. 

This type of behavior can be captured by an ESTAR process. The particular form 
of the ESTAR model they consider is 


p-l 
Ay, = +4,y,1 + 2 ajAy,_; 
i=l 


p-1 
+ [1 — exp(-7(Y,-4 — c)’)| (a + biy- + >, na.) +E, 


i=1 


where y, is the real exchange rate. 
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FIGURE 7.8 A Comparison of 6 Values in the LSTAR and ESTAR Models 


When y,_4 = c, the adjustment process is given by 


p-l 
Ay, = @ + 4,y,-1 + 2, ajAy,_; + E; 
i=l 


and as y,_; > +00, the adjustment process is given by 


p-l 
Ay, = (a + Bo) + (a, + by )y-1 + YG; + BAY i + Er 
i=l 
The nature of transactions costs implies that a; may be very small (or zero). After 
all, when y,_, © c, there is little incentive to arbitrage the market. However, since large 
deviations are mean reverting, b; should be negative. Their estimate of the monthly 
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United States—United Kingdom real exchange rate over the 1921M1 — 1925M5 period 
is (with f-statistics in parentheses): 


Ay, = 0.40Ay,_, + [1 — exp(—532.4(y,_; — 0.038)?)] 
(3.37) (2.44) (7.21) 
(—y,-1 + 0.59Ay,_» + 0.57Ay,_4 — 0.017) 
(3.90) (2.89) (5.17) 


The point estimates imply that when the real rate is near 0.038, there is no ten- 
dency for mean-reversion since a, = 0. However, when (y,_; — 0.038)" is very large, 
the speed-of-adjustment coefficient is quite rapid. Hence, the adjustment of the real 
exchange rate is consistent with the presence of transaction costs. 


Pretests for STAR Models 


It is not possible to directly perform an LM test of the presence of ESTAR or LSTAR 
behavior. Consider the LSTAR model 


Ye =U +y 1 + (By +y- Dl + expr Oa -DNT +E, 


For this model, the null hypothesis that the model is linear is equivalent to setting 
y = 0. You should be able to see the problem with using the LM test. If y = 0, the 
magnitudes of fo, p4, and c are completely irrelevant because the model degenerates 
into the linear process y, = Yọ + 7, );_1 + €, Where yo = a + fo/2 andy, =a, +f,/2. 
The point is that the values of fo, f; , and c are unidentified under the null hypothesis that 
the model is linear. For example, when y = 0, the parameter values a) = 1 and pọ = 0 
yield identical results to those for a = 0 and fọ = 2. As such, it is not possible to test 
for the STAR form of nonlinearity using a standard LM test. It is worth a few minutes 
of your time to try the following exercise. Find the partial derivatives of the LSTAR 
model and evaluate each under the null hypothesis y = 0. Indicate the functional form 
of the resulting auxiliary regression (Hint: dy,/dc evaluated at y = 0 is zero). 

Since the LM test fails for LSTAR (and ESTAR) adjustment, other means are nec- 
essary to detect the presence of a smooth transition model. In contrast to a supremum 
test, Terasvirta (1994) develops a simple framework that can often detect the presence 
of nonlinear behavior. Moreover, the method can be used to determine whether a series 
is best modeled as an LSTAR or an ESTAR process. The test is based on a Taylor series 
expansion of the general STAR model. For the LSTAR model, we can write 0 as 


0 =[1l+exp(-y(y_-g- IT! = [1 + exp(-h,_)! 


so that h, 4a = yO;-¢ — ©)- 

Now, the trick is to take a third-order Taylor series approximation of 0 with respect 
to h,_, evaluated h,_, = 0. Recall that a Taylor series expansion of @ will have the 
form:! 

8 = 0(0) + O'(O)h,_g + 0” (0)h?_4/2 + o” (Oh? 4/6 


where 6(0), 6’(0), 0” (0), and 0” (0) denote the derivatives of 6 evaluated at h,_4 = 0. 
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Although taking the partial derivatives is a bit tedious, after a bit of manipulation 
it is possible to obtain 


oo evhra 
Ohg (+e tay 


equals 1/4 at h,_, =0 


020 = —e™"-a(] = e"id) 
an? (1 F ema) 


equals 0 at h,_, = 0 


0°70 _ et-a(1 + eo -d = 4e7M-a) 
ans (1 + eat 


equals —1/8 at h_a = 0 


Hence, the desired expansion of 0 has the form 
0 =h,_4/4—h?_,/48 
so that 


Yi = AQ + Yi Ho FAHY py + (Bo + Biyi +++ + BpYi-p) 
(ha/4- h} 1/48) +€, 

Since h,_q is linear in y,_q, [i.€., h-a = Y(Y;-a — ©)], we can write the approxima- 

tion of the LSTAR model in the form: 
Yi = Qo FAY Hte F ApYt-p F AY -a t't: 
2 2 
F AipYt-pYt-d F 421Yt-1Yş-a T'E ap M11 
3 3 
+ 431V 1-1-4 arse Se a3pYt-pY t-d +E, 

Thus, you form the products of the regressors and the first, second, and third powers 
of yg (€, Yid» y p and ye q) In essence, you construct a special form of a GAR 
model as in equation (7.1). Then, you can test for the presence of LSTAR behavior by 
estimating an auxiliary regression: 

e, = Qo + AY) He F App F AMV i-a t't 
2 2 
+ aipYt-pYt-d + 491 N 1-1-4 ate <P ApYt-1Y i-d 
3 3 
+ 43, V Vig T + a3pYi—pY;-a t Er (7.21) 

The test for linearity is identical to testing the joint restriction that all nonlinear 
terms are zero (1.€., 44; =+++ = aip =) = +++ =A, = 03) = +++ = a3, = 0). You 
can perform the test using a standard F-test with 3p degrees of freedom in the numer- 
ator. If you are not sure of the delay factor, the recommendation is to run the test using 
all plausible values of d. The value of d that results in the smallest prob-value (i.e., the 
value of d providing the best fit) is the best estimate of d. 


With all of the background work completed, it is straightforward to rework the 
details for an ESTAR model. Let 0 be: 


0 = 1 —exp(—h?_,) 


444 CHAPTER7 NONLINEAR MODELS AND BREAKS 


so that h,_, = y!/*(),_q — ©). Now, the partial derivatives are given by: 


Equals Evaluated at h,_, = 0 
00/dh,_4 2h._4 exp(—h?_,) 0 
0°0/oh” , 2 exp(—h?_,) — 4h exp(—h?_,) 2 
@0/ðh? , —12h,_,exp(—h?_,) + 8h}_, exp(h_,) 0 


Unlike the LSTAR model, the expansion for the ESTAR model has the quadratic 
form: 0 = h? a: Thus, we can write the expansion of the ESTAR model without h,_, 
and h? q: Hence, the Taylor series approximation has the form: 


2 
Yi = Ao + AY Hs E AY p + (Bo + Piy- Heee + Bo Yt-p) (Tha) +E, 
2 
= do + AY Tt TAY yp FAY 1M t-a FF AipYt-pYt-d F 211-1 r-a 
2 
apie ApYt—-1Y t-a T Er 


The key insight in Teräsvirta (1994) is that the auxiliary equation for the ESTAR 
model is nested within that for an LSTAR model. If the ESTAR is appropriate, it should 
be possible to exclude all of the terms multiplied by the cubic expression ye q from 
(7.21). Hence, the testing procedure follows these steps: 


STEP 1: Estimate the linear portion of the AR(p) model to determine the order p and 
to obtain the residuals {e,}. 

STEP 2: Estimate the auxiliary equation (7.21). Test the significance of the entire 
regression by comparing TR? to the critical value of y?. If the calculated 
value of TR? exceeds the critical value from a x table, reject the null 
hypothesis of linearity and accept the alternative hypothesis of a smooth 
transition model. (Alternatively, you can perform an F-test). 


STEP 3: If you accept the alternative hypothesis (i.e., if the model is nonlinear), 


test the restriction a3, = 432 =+: = d3„ = 0 using an F-test. If you reject 
the hypothesis a34 = a32 =--- = a3„ = 0, the model has the LSTAR 
form. If you accept the restriction, conclude that the model has the 
ESTAR form. 


Sometimes the tests for ESTAR versus LSTAR behavior outlined in STEP 3 may 
not be clear cut. In such circumstances, Lin and Teräsvirta (1994) recommend the fol- 
lowing procedure. To keep the notation compact, write the auxiliary equation given by 
(7.21) as 


e, = a + A(L)y,-1 + (By + BD, Dla ya + ahg + ahal + E, 


where [mha + mh? gF zh? q] İS the Taylor series approximation of 0 and A(L) and 
B(L) are polynomials in the lag operator L. Consider the following hypotheses: 

Ho: All coefficients of (By + B(L)y,_) [2,49 + mh? y + mh} |] =0 

H: All coefficients of (By + B(L)y,_,)23h?_, = 0 
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H,: All coefficients of (By + B)y, ah? q = © given that all coefficients of 
(By + BUy,-v3h;_, = 0 
H: All coefficients of (By + B(L)y,_,)2,h;_¢ = O given that all coefficients of 
(By + BL)y,_, )Lah?_, + mh? = 0 
As above, if you cannot reject Hp, simply conclude that the model is linear. How- 
ever, if you can reject Hy, obtain the prob-values for H,, H, and H3. Since m, should be 
zero with an LSTAR but not an ESTAR process, if H, has the smallest prob-value (so 
that the restriction is more binding than the others), conclude that you have an ESTAR 
process. Since z, and z should be zero with an ESTAR process, if either H} or H has 
the smallest prob-values, conclude that you have an LSTAR process. 


8. OTHER REGIME SWITCHING MODELS 


The artificial neural network and the Markov switching model represent other types of 
regime switching models that appear in the literature. Although they cannot be readily 
estimated by OLS, it is worthwhile to review their properties. 


The Artificial Neural Network 


The artificial neural network (ANN) can be useful for nonlinear processes that have an 
unknown functional form. The simple form of the ANN model is 


Y, = Ay + 41Y;-1 + Ya fOr +E, (7.22) 
i=l 


where the function f;(y,_,) is a cumulative distribution or a logistic function such as 
that in (7.20). For the case of the logistic function, we can write 
Yr = ao FAY + a,[1 + exp(—y;()-1 — eo)" +E, 
i=l 

Although the ANN is very similar to the LSTAR model, there are some important 
differences. First, the ANN allows only the intercept to be time-varying; the autore- 
gressive coefficient a, is constant. As such, the level of the series is changing over 
time. Second, the ANN uses n different logistic functions (called nodes). Kuan and 
White (1994) prove that, for sufficiently large n, this type of model can approximate 
any first-order nonlinear model arbitrarily closely. As such, the ANN is particularly 
useful for estimating nonlinear relationships that have an unknown functional form. 

Although the model can fit the data extraordinarily well, there is an obvious diffi- 
culty in that the model does not have a clear economic interpretation. Since the ANN 
can be extended to high-order autoregressive processes, it can have an extremely large 
number of parameters. As such, there is a danger of overfitting the data. If you let n 
become too large (i.e., if you use too many nodes) you will wind up fitting the noise 
component of the data. The fact that R? > 1 as n grows increasingly large, should not 
be especially comforting if the goal is to forecast subsequent values of the series. Many 
researchers would select the value of n using the parsimonious SBC. 
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Notice that the parameters are not globally identified for n > 1. Numerical opti- 
mization routines have difficulty finding the parameter values that minimize the sum 
of squared residuals since many local minima often exist. To circumvent the problem, 
a number of different routines are used to estimate the parameter values. Although the 
details are not necessary for our purposes, it is instructive to consider the “recursive 
learning” method discussed in White (1989). Suppose you use the first t observations 
of your data set to obtain the nonlinear least squares estimates of the parameters. Let 6, 
denote the vector of estimated parameters using these ¢ observations and let },,, denote 
the predicted value y,,,. The value of 6, acts as an initial condition in the difference 

equation: 
Pi =9, +m O41 — Se) 

where y, is generally taken to be a multiple of vector of partial derivatives of (7.22) 
with respect to the parameters evaluated at the point estimates of 6,. The successive 
values of 6, 41 are obtained until all the parameter estimates converge. 

We can follow White (1989) and explore the ability of the ANN to mimic chaos. 
Recall that a sequence {y,} is said to be chaotic if it is generated from a deterministic 
difference equation such that it does not explode or converge to a constant or to a repet- 
itive cycle. Thus, the sequence may appear to be random even though it is completely 
deterministic. In particular, let y} = y) = 0.5 and suppose that the next 98 values of the 


{y,} sequence are generated according to 
y, = 1 — 1.4y7; +0.3y,-5 


The actual and fitted values of the series are shown in Figure 7.9. Although just 
two nodes were used to estimate the series, the fit of the ANN is quite reasonable. The 
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FIGURE 7.9 The ANN Fitted to Chaos 
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example illustrates the point that the ANN is capable if capturing a highly nonlinear 
process when the functional form is completely unknown. 


The Markov Switching Model 


The basic threshold model allows the regime switch to depend on the magnitude of an 
observable variable. If y,_, exceeds some threshold value, the system is in regime one; 
otherwise, the system is in regime two. Although regime switching is more gradual in 
the STAR and ANN models, the adjustment process depends on the current state of 
the system. In contrast, the Markov switching model developed by Hamilton (1989), 
posits that regime switches are exogenous. To take a simple example, suppose there 
are two regimes (or states of the world) and that the autoregressive process for y, is 
regime-dependent. In particular, let: 


Yi = Aygo + ayy) + Ey; if the system is in regime 1 
VY, = Ag + Ayi] + Ex if the system is in regime 2 


At this point, the model looks very much like a TAR model of (7.15) in that the 
autoregressive coefficient is a, in regime | and a, in regime 2. However, in contrast 
to the TAR model, there are fixed probabilities of a regime change. If p4; denotes the 
probability that the system remains in regime one, (1 — p,,) denotes the probability 
that the system switches from regime one to regime two. Similarly, if pọ) denotes the 
probability that the system remains in regime two, (1 — p>,) is the probability that the 
system switches from regime two to regime one. Thus, the switching process is actually 
a first-order Markov process. No attempt is made to explain the reason that regime 
changes occur and no attempt is made to explain the timing of such changes. There are 
several important features of the Markov switching model: 


1. Since the transition probabilities (i.e., p}; and p>) are unknown, they need to 
be estimated along with the coefficients of the two autoregressive processes. 
As in the TAR model, if one of the regimes rarely occurs, the coefficients for 
that regime will be poorly estimated. 

2. The overall degree of persistence depends on the autoregressive parameters 
and the transition probabilities. For example, if a, > a, and p4; is large, the 
process will tend to remain in the regime with substantial autoregressive per- 
sistence. Moreover if p>, is small, the system will have a tendency to switch 
into regime one from regime two. 

3. The probabilities p,;,(1 — p11), P22 and (1 — poy) are all conditional probabil- 
ities. For example, if the system is in regime two, (1 — p22) is the conditional 
probability that the system switches into regime one. It is also of interest to 
calculate the unconditional probability that the system is in regime one (p,) 
and in regime two (p,). In Exercise 3 at the end of this chapter, you are asked 
to show that 


Pi = (1 — poy) /(2 — P11 — Pra) 
P2 = (1 — py) /(2 -Pii — P22) 
Thus, if Dy = 0.75 and p> = 0.5, pı = 2/3 and p, = 1/3. 
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4. A number of papers, including Clements, and Krolzig (1998), try to use var- 
ious statistical means to distinguish between a Markov switching model and 
a STAR model. It is very difficult to do so, especially if the Markov switch- 
ing model is modified to allow the transition probabilities to depend on the 
variables in the model. 


Usually, Markov switching models are applied to estimate the level of a series. 
However, Edwards and Susmel (2000) use a regime-switching model to examine the 
interest rate volatility in emerging markets. It is argued that the standard GARCH 
model is not applicable to emerging markets because of the occurrence of large shocks. 
Although a GARCH model estimated using a f-distribution could account for fat-tailed 
returns, such models will typically predict too much volatility persistence. As illus- 
trated in Chapter 3, the sum of the coefficients in a GARCH model is often close to 
unity. As an alternative, consider a three-state model containing a low-volatility regime, 
a moderate volatility regime and a high-volatility regime. If the probability of switch- 
ing out of a high-volatility state is large, high volatility does not need to be extremely 
persistent. 

Edwards and Susmel use weekly interest rate data for Argentina, Brazil, Chile, 
Hong Kong, and Mexico. They begin by estimating an AR(1) equation for the model 
of the mean and a GARCH(1, 1) model for the variance. Consider the estimated set 
of equations for Brazil (with standard errors in parentheses) over the April 18, 1994, 
through April 16, 1999, period: 


Ar, = —0.0133 — 0.217Ar,_, + €, 


(0.04) (0.10) 


0.058 + 1.321e? , + 0.395h,_, 
(0.03) (0.25) (0.05) 


where r, is the Brazilian short-term interest rate and h, is the conditional variance. The 
model of the mean is in first differences since r, is a unit root process. 

Although the coefficients are significant at the 5% level, there is a disturbing fea- 
ture of the model. Notice that the sum of the coefficients in the equation for h, exceeds 
unity. As such, the model predicts that volatility is explosive. As an alternative, Edward 
and Susmel consider the volatility switching ARCH (SWARCH) model. The basic form 
of the model is 


h, 


q 
hi/Ys = 4% + J, ai(€?_;/ 75) 
i=1 


where s = 1, 2 or 3 refers to the current state (i.e., low, moderate, or high). 

Note that one of the values of y, must be normalized to equal unity. Moreover, if 
yı = 1, the other values of y, measures the ratio of the conditional variance in state s 
relative to that in state 1. The estimated SWARCH model for Brazil is 


Ar, = —0.087 + 0.016Ar,_) + £; 
(0.03) (0.05) 


h,/y, = 0.131 + 0.0682? ,/y, 
(0.03) (0.10) 
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and 
yı = Ly. = 4.851, andy; = 128.51 


It is striking that the high-volatility state is more than 128 times more volatile than 
the low-volatility state. Nevertheless, the probability of a switch from the high-volatility 
state to the other states was found to be high. Hence, the high-volatility state was found 
to be short-lived. 


9. ESTIMATES OF STAR MODELS 


This section illustrates a number of techniques used in the estimation of regime 
switching models. The goal is to demonstrate a number of practical issues that arise in 
applied work. 


An LSTAR Model 


To illustrate the process of estimating an LSTAR model, 250 realizations of the follow- 
ing sequence were generated: 


y, =14+0.9y,_, + (-—3 — 1.7y,_,)/[1 + exp(—10(Q,_,; - M] + £, (7.23) 


You can follow along using the data in the file LSTAR.XLS. If you compare 
(7.23) to (7.20) you will see that the smoothness parameter y = 10 and that 0 = 1/[1 + 
exp(—10(y,_,; — 5)]. As y,_; > —oo, the behavior of y, is governed by the autoregres- 
sive process | + 0.9y,_; + £, and as y,_; > +00, the behavior of y, is governed by 
—2 — 0.8y,_; + €;. Note that in the neighborhood of y,_; = 0, the value of @ is approx- 
imately equal to zero. The 250 realizations are shown in Figure 7.10. The simulated 
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FIGURE 7.10 The Simulated LSTAR Process 
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sequence has a sample mean of 0.62 and a standard deviation of 3.43. The first six 
autocorrelations are 


Pı P2 P3 P4 Ps P6 
0.552 0.270 0.067 —0.039 —0.136 —0.161 


The first few autocorrelations seem to exhibit geometric decay and those for lags 
5 and 6 have prob-values near 5% (2 - 250!/2 = 0.1265). If we did not know the actual 
data-generating process, we might be tempted to estimate the series as a linear AR(1) 
process. In fact, the estimated linear model looks to be quite plausible; consider: 


y, = 0.278 + 0.552y,, + €, 


(1.50) (10.42) (7.24) 


and 
AIC = 1901.19 SBC = 1908.22 


The residual autocorrelations are such that there is no linear relationship in the 
residuals. The first 12 autocorrelations of the residuals are 


Py P2 P3 P4 Ps Po Py Pg Po Pi0 Pu Pi2 
0.03 0.01 -0.06 -0.02 -0.1 -0.04 -0.11 -0.09 0.07 -0.00 -0.05 —0.06 


The Ljung—Box Q-statistics are such that the prob-values for the first four, eight, 
and 12 lags are 0.900, 0.347, and 0.471, respectively. Since the autocorrelations of the 
residuals are not significant at conventional levels, you might be tempted to conclude 
that the true data-generating process was an AR(1). However, a battery of nonlinear 
diagnostic testing reveals a very different picture. Note first that the autocorrelations of 
the squared residuals also suggest that the linear model is adequate. The autocorrela- 
tions of the squared residuals are 

ACF of the squared residuals 

Pi P2 P3 P4 P5 Po P7 Ps Po Pio Pii Pr2 
0.03 -0.04 -0.07 -0.10 -0.09 -0.10 -0.08 -0.07 0.14 0.00 -—0.02 -—0.05 


In contrast, the RESET test indicates a nonlinear relationship. Call e, and $, the 
residuals and the fitted values from the linear model, respectively. Given that the 
best-fitting model is an AR(1), we can use the residuals from (7.24) to obtain 


e, = 0.932 + 0.710y,_; + 0.0589? — 0.15793 — 0.03454 
(4.24) (9.04) (0.64)  (—9.39)  (—4.84) 


Notice that most of the individual coefficients appear to be statistically significant. 
However, you should not rely on the individual f-statistics because the regressors are 
highly correlated; for example, large values of 3 will be associated with large values 
of 9. The issue is whether the values of fi have any explanatory power as a group. 
The F-statistic for the null hypothesis a, = a, = a, = 0 equals 95.60. Since there are 
three degrees of freedom in the numerator (we impose three restrictions) and 244 in the 
denominator (250 observations minus 5 estimated coefficients and 1 lost observation 
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resulting from the lagged value y,_,), we can reject the null hypothesis at any conven- 
tional significant level (the 1% critical value is 3.86). Hence, we conclude that the series 
exhibits some form of nonlinear behavior. 

It is quite a bit more difficult to pin down the form of the nonlinearity. Since the 
data are simulated, there is no possibility of using economic theory to suggest the most 
probable form of nonlinearity. Hence, one way to proceed is to estimate a number of 
nonlinear models and select the one that fits the best. However, the danger of this pro- 
cedure is that you are likely to overfit the data. A more prudent way to proceed is to 
perform a number of Lagrange Multiplier tests to determine which models are likely 
to be the most plausible. 

One test that can be useful to select the functional form is Terdsvirta’s test for 
LSTAR versus ESTAR behavior. Pretend that we do not know the value of the delay 
parameter d. It seems natural to begin with d = 1. From the Taylor series expansion 
for a first-order LSTAR model, we need to regress the residuals from the linear model 
on the regressors (i.e, a constant and y,_,) and on y,_1, Yi and VA multiplied by the 
regressors. The estimated auxiliary regression is: 


e, = 0.933 + 0.076y,_; — 0.027y?_, — 0.039y3 , — 0.003y*4 , 
(4.35) (9.21) (-0.987)  (-11.52) (4.84) 


The F-statistic for the entire regression is 71.70; with four numerator and 244 
denominator degrees of freedom, the regression is highly significant. Moreover, the 
F-statistic for the presence of the nonlinear terms Y ye , and yt , is 95.60; with 
three numerator and 244 denominator degrees of freedom, we can conclude that there 
is STAR behavior. Next, we can determine if LSTAR or ESTAR behavior is the most 
appropriate. Given the t-statistic on the coefficient for Ye p» We cannot exclude this 
expression from the auxiliary equation. Hence, we can rule out ESTAR behavior in 
favor of LSTAR behavior. It is possible that the delay parameter is two even though 
y,;-2 does not directly appear in the model. To determine whether y,_; or y,_» is the 
most appropriate threshold variable, you can estimate the following auxiliary equation 
using d = 2 

e, = 0.738 + 0.047y,_, — 0.158y,_1y,2 — 0.005y,_1y7_5 + 0.003y,_1y2 5 

The F-value for this regression was only 5.73. Since the d = 1 yields a substantially 
better fit than the d = 2, we can conclude that the y,_, is the most appropriate threshold 
variable. Thus, it seems reasonable to estimate a nonlinear model of the form 

Yi = Qo + AY + (Bo + Biy,-1)/C1 + exp(—7Q,-1 — 0) +E, 


Since the coefficients are multiplicative, OLS cannot be used to obtain the least 
squares estimates of the coefficients. Instead, it is standard to estimate such models 
using nonlinear least squares (NLLS) or maximum likelihood estimation. Consider the 
estimates obtained using NLLS 


y, = 0.941 + 0.923y,_, + (5.86 — 1.18y,_,)/( + exp(-11.21(y,_, — 5.01) + £, 
(14.43) (45.15) (=2.07) (-2.45) (6.77) (312.33) 


AIC = 1365.22 SBC = 1386.33 (7.25) 
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The point estimates of all the parameters except fy are every close to their true 
values. Clearly, Jù, is poorly estimated since is within 2 standard deviations from zero. 
The AIC and the SBC both select the LSTAR model over the linear model. Note that you 
need to be wary of the f-statistics for several reasons. First, the nonlinear least squares 
estimates do not rely on the assumption that the error term is normally distributed. 
Second, the estimates are all performed using numerical methods so that estimation 
is not exact and f-statistics can be inflated. Third, some parameters are unidentified 
under the null hypothesis y = 0; clearly, the t-statistic for the null hypothesis y = 0 
is problematic. Given these caveats, the estimated model does capture the essential 
features of (7.23). 

In many circumstances, the numerical methods used to estimate the parameters of 
STAR models have difficulty in simultaneously finding y and c. It is crucial to provide 
the numerical routine with very good initial guesses. If there are problems, a popular 
modification of Haggan and Ozaki’s (1981) method is to estimate y using a grid search. 
Fix y at its smallest possible value and estimate all of the remaining parameters using 
NLLS. Slightly increase the value of y and reestimate the model. Continue this process 
until the plausible values of y are exhausted. Use the value of y yielding the best fit. 
Note that if y is large, the transition is sharp in the neighborhood of y,_, = c so that 
the LSTAR model acts like a TAR model—in fact, if y is large and convergence to a 
solution is a problem, it could be easier to estimate a TAR model instead of the LSTAR 
model. Terdsvirta (1994) notes that rescaling the expressions in 0 can aid in finding a 
numerical solution. With an LSTAR model, he found it useful to standardize by dividing 
exp[—y(y,_q — ©)] by the standard deviation of the {y,} series. With an ESTAR model, 
he standardized by dividing exp[—y(y,_4 — c)’] by the variance of the {y,_,} series. In 
this way, the threshold value c is measured in standardized units so that a reasonable 
value for the initial guess (e.g., c = 1 standard deviation) can be readily made. An 
example is shown in Question 5 below. There is an extended discussion of some of 
these issues in Chapter 3 of the Programming Manual accompanying the text. 


The Real Exchange Rate as an ESTAR Process 


As indicated earlier, Michael, Nobay, and Peel (1997) argue that transaction costs 
should make real exchange rates behave as ESTAR processes. For our purposes, the 
series of interest is now the annual observations of the U.K.—U-S. real rate over the 
1791 to 1992 period. The first issue is to determine whether or not the rates are sta- 
tionary; after all, if the rates are unit root processes, the theory of PPP fails. As such, 
they use augmented Dickey—Fuller tests to determine whether the series contains a unit 
root. The use of annual data results in very short lags. If we ignore the intercept, the 
estimated equation for the U.K.—U.S. rate is 


Ay, = —0.12y,_, + 0.12Ay,_; + £; 
(—3.62) (1.75) 


In absolute value, the t-statistic of —3.62 exceeds critical value reported in the 
Dickey—Fuller table; as such, it is possible to reject the null hypothesis of a unit root in 
the real exchange rate. The point estimate of —0.12 implies a fairly slow speed of adjust- 
ment; approximately 88% of the current period’s discrepancy from PPP is expected to 


(7.26) 
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persist into the next year. Nevertheless, this linear model forces the speed of adjust- 
ment to be constant. (Some of the issues concerned with unit roots and nonlinearity are 
discussed in detail in Section 11.) 

Given that the series is stationary, the next issue is to determine whether 
Terasvirta’s four-step methodology indicates the presence of ESTAR adjustment. 
Given the lag lengths, the most plausible value of the delay parameter d is unity. 
Nevertheless, the authors follow the standard procedure and select the value of d that 
results in the best fit of the auxiliary equation. As suspected, the value d = 1, fits 
the data better than the alternatives d = 2 or d = 3. The auxiliary regression has the 
form as (7.21). The prob-value of F-statistic for the null hypothesis that all values of 
a; = 0 in the U.K.-U.S. auxiliary equation is 0.076. Hence, there is weak evidence of 
nonlinear behavior in the U.K.—U.S. rate. 

Given the presence of threshold adjustment, the next issue is to test for LSTAR 
versus ESTAR adjustment. The F-test for the null hypothesis that all values of 
a3; = 0 has a prob-value 0.522; as such, it is not possible to reject the null hypothesis 
of ESTAR adjustment. Notice that the test for nonlinearity has less power than 
the test for ESTAR versus linear adjustment. The auxiliary equation for nonlinear 
adjustment has coefficients for both the LSTAR and ESTAR models. If there is 
ESTAR adjustment, a number of the coefficients are unnecessary. Hence, the authors 
constrain all values of a3; = 0, and test whether the remaining coefficients are zero. 
The F-statistic for this test has a prob-value of 0.028. Hence, this test with enhanced 
power suggests ESTAR versus linear adjustment. 


10. GENERALIZED IMPULSE RESPONSES 
AND FORECASTING 


This section presents two different estimated threshold models. Each was selected 
to emphasize a different aspect of the general methodology. First, Potter’s (1995) 
TAR model of U.S. GNP is presented. The interesting feature of Potter’s study is the 
calculation of impulse responses from a TAR model. Second, Enders and Sandler’s 
(2002) forecast function for the number of casualties caused by transnational terrorists 
is examined. 


Nonlinear Estimates of GNP Growth 


Potter (1995) argues that a nonlinear model of U.S. GNP growth performs much better 
than a linear one. To begin, Potter estimates the following AR(5) model of the logarith- 
mic change in the quarterly values of real U.S. gross national product (GNP) growth 
over the 1947Q1 to 1990Q4 period: 


y, = 0.540 + 0.330y,_, + 0.193y,_, — 0.105y,_3 


(4.42) (4.23) (2.35) (-1.27) 
~0.092y,_4 — 0.024y,_5 + £; AIC* = 8.00 
(-1.12) (—0.308) 


where y, = 100*[log(GNP,) — log(GNP,_,)] 
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Potter also estimates a two-regime TAR model allowing the variances to differ 
across regimes. He states that pre-testing yields a delay factor of 2 (i.e., d = 2) anda 
threshold of zero. After purging the threshold regression of insignificant coefficients, 
Potter reports the following TAR model: 


y, = 0.517 + 0.299y,_, + 0.189y,_, — 1.143y,_5 + Ei; Y2 > O 
(3.21) (3.74) (1.77) (—16.57) 

y, = —0.808 + 0.516y,_, — 0.946y,, — 0.352y,5 + Ez Y2 <0 
(-1.91) (279) (—2.68) (—1.63) 


The presence of the AR(5) terms is unusual because the data are seasonally 
adjusted and there is no particular reason to suppose that the fifth lag (but not lags 
3 and 4) affect the contemporaneous value of GNP. However, Potter reports that 
the AR(3) and AR(4) coefficients are not statistically different from zero at the 5% 
significance level. There are 37 observations in the contractionary regime (y,_, < 0) 
and 133 observations in the expansionary regime (y,_, > 0). The estimated variance 
of €,;, = 0.763 and the estimated variance of £), = 1.50. Thus, the magnitudes of 
shocks while in the contractionary regime tend to be quite large. The large negative 
coefficient in the AR(2) term in the contractionary regime has an interesting economic 
implication. When y,_, < 0, there tends to be a sharp reversal in the contraction of 
output since the product of —0.946 and y,_, is positive. 


RECURSIVE FORECASTS The AIC was constructed by combining the resid- 
ual sums of squares from the two segments of the TAR model. This value of the 
AIC* (= —4.89) clearly selects the TAR model over the linear model. In order to 
compare the out-of-sample forecasts, the following procedure was used. Beginning 
with the sample period 1947Q1 through 1960Q1, linear and TAR models were esti- 
mated. For each model, the one-step-ahead forecast was obtained. Then, the sample 
period was updated by one quarter and new linear and TAR models were estimated. 
These updated models were used to obtain one-step-ahead forecasts. Repeating this 
procedure though the end of the sample yielded two sets of one-step-ahead forecasts. 
The correlation of the forecasts with the actual values of output growth was 0.23 for 
the linear model and 0.35 for the TAR model. As such, the forecasting performing 
performance of the TAR model exceeds that of the linear model. 


Impulse Responses 


In a linear model, the impulse responses are not history dependent and the magnitude 
of the shock does not alter the time-profile of the responses. For example, in the linear 
AR(1) model y, = py,_; + £, the impulse responses are given by 


Hence, the effect of a one-unit shock on y, is 1, the effect of the shock on y,,, 
is predicted to be p (i.e., Oy,,;/0€, = Oy,/0€,_; = P), the effect the shock on y,,, is 
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predicted to be p”, and so forth. Moreover, the effects of a two-unit shock are simply 
twice those for the one-unit shock and the effects of a negative shock are simply the 
negative of those for positive shocks. However, the interpretation of impulse response 
functions for a nonlinear model is not straightforward. The reason is that the impulse 
responses are history dependent. The effect of an £, shock on the time path of the 
system depends on the magnitudes of the current and subsequent shocks. Clearly, the 
sign of the shocks can matter. To take a simple example, in a TAR model with t = 0, 
the impulse responses for a one-unit positive shock will have a different time path than 
a one-unit negative shock. Moreover, the size of the shocks matter; if you are in the 
contractionary regime, a small positive shock can imply a different time profile than a 
very large shock since the small shock is less likely to induce a regime change. Thus, to 
calculate impulse responses, it is necessary to specify the history of the system and the 
magnitude of the shock. Moreover, the effects of a shock to £, on y,, 49 will depend on 
the magnitudes of the shocks that take place in periods (t + 1) through (t + 9). There are 
several ways to attack the problem. Potter considers shocks of four different magnitudes 
—2%, -1%, 1%, and 2%. Moreover, he considers several different histories. Consider: 


m Inthe three quarters of 198303, 1983Q1 and 198401, real GNP growth at an 
annual rate was a remarkable 7.1%, 8.2%, and 8.2%, respectively. As such, 
even a —2% shock would cause GDP growth to remain in the positive regime. 
Hence, the responses are very similar to those that would are obtained from a 
linear model. Since there is no regime switching, the 1% and 2% shocks are 
multiples of each other. The four impulse responses for this particular history 
are shown in the top panel of Figure 7.11. 


m The situation was very different in 1970Q2 in that the economy experienced 
a mild downturn. For 196904, 1970Q1 and 197002, GNP growth measured 
at an annual rate was —1.9%, —0.46%, and 0.91%, respectively. Hence, the 
negative, but not the positive, shocks push GNP growth across the threshold of 
zero. The lower panel of Figure 7.11 shows the asymmetric responses. Given 
that the contractionary regime has a AR(2) coefficient that is nearly —1.0, 
negative shocks are less persistent then positive shocks. As such, you can see 
the rather quick turnaround in GNP growth predicted to begin in 197003. 
Also notice that the effects of the —1% and —2% shocks are not proportional to 
each other. 


Notice that these impulse response functions trace out the effects of different sized 
e, shocks (t = 1984Q1 and 197002) assuming subsequent shocks are all zero. Using 
the methods discussed below, it is possible to generalize the impulse response func- 
tions to allow for the effects of any ensuing shocks. The general point is that the 
impulse responses from a nonlinear model depend on the sign and magnitude of the 
shocks as well as the initial state, or history, of the system. Tracing out the effects of a 
single period’s shock alone is problematic since it will never be that case that all subse- 
quent shocks will equal zero. To remedy the problem, Koop, Pesaran and Potter (1996) 
develop a generalized impulse response function. Consider the simple TAR model 
y, = 1,0.9y,_, + (1 —1,)0.ly,_, + £, where J, = 1 ifr > 0 and 0 otherwise. To trace out 
the effects of a single one-unit shock to €,, suppose that the initial value is yọ = 0. As 
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Panel (a): Responses for 198401 
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FIGURE 7.11 Impulse Responses for Two Histories 


shown in columns 2 and 3 of Table 7.2, the first value is y, = 1 and, since there are never 
any regime changes, the subsequent values decay at the rate y, = 0.9y,_,. However, this 
time path is misleading because it ignores the possibility or regime switching. 

The columns 4 and 5 of Table 7.2 illustrate the effects of drawing the subsequent 
values of {€,} from the model’s residuals. Given €, = 1, if the random draws are such 
that £, = —1, £3 = 0, £4 = 1, column 5 indicates that the resulting values for y, are: 
yə = —0.100, y, = —0.010, and y, = 0.999. As you can infer from the table, drawing 
€>, €3, and £4 means that decay is not geometric since the process switches between 
regimes. 


Table 72 Impulse Responses 


Time Ei Yi Et Yi Et ye d, =y,- yf 
0 0.000 0.000 0.000 
1 1 1.000 1 1.000 0 0.000 1.000 
2 0 0.900 -1 -0.100 -1 -1.000 0.090 
3 0 0.810 0 -0.010 0 -0.010 0.000 
4 0 0.729 1 0.999 1 0.990 0.009 
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Now, as displayed in column 6, suppose €, = 0 but €5, €3, and £4 are unaltered. 
With these values, the alternative values for y,, say yf, are yi = 0.00, 5 = —1.000, 
y5 = —0.010 and y4 = 0.990. The difference between the two series, d,, is shown in the 
last column of the table. The difference reflects the effects of a one-unit £; shock on 
the y, series. 

Of course, other draws for £3, €3, and £4 would result in different values for d,. 
However, it is possible to repeat the process for several thousand Monte Carlo replica- 
tions. Taking the sample average of the resultant d, series yields a generalized impulse 
response function for a particular history (i.e., yọ = 0) and for a particular sized shock 
(£; = 1). The solid line in Figure 7.12 shows the averaged impulse responses from 
2000 replications when drawing {¢,} from a normal distribution with a variance equal 
to unity. The dashed line shows the geometric decay resulting from y, = 0.9y,_,. It 
should be clear that the impulse responses decay more rapidly than the dashed line rep- 
resenting y, = 0.9y,_,. The reason is that the generalized impulse responses allow for 
the possibility of regime switching. 

Instead of conditioning on a shock of a given size and a particular history, Koop, 
Pesaran and Potter’s (1996) method allows for the averaging across all sized shocks 
and all histories. For example, instead of setting £; = 1, Panel (b) of Figure 7.12, sets 
€, = 4. Notice that the impulse response function is quite close to geometric decay. 
The reason is that the large initial value of €} diminishes the likelihood that the series 
switches into the negative regime. Although not shown, you should take a little quiz 
and ask yourself how Panels (a) and (b) in Figure 7.12 would appear if the shocks 
were —1 and —4? The answer should be obvious in that the series would begin in the 
negative regime (with rapid decay). Both would decay quickly, but not as quickly as the 
process y, = 0.1y,_; since shocks could switch the series into the positive regime. It is 
also possible to plot the responses for different histories (i.e., different initial values). 
A common technique is to average the responses over all histories so that the typical 
responses to a shock are displayed. 


Panel (a): Responses to a 1-unit Shock Panel (b): Responses to a 4-unit Shock 
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FIGURE 7.12 Impulse Responses from a TAR Model 
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Terrorist Incidents with Casualties 


A realistic way to capture the nature of terrorist campaigns is to use a two-regime TAR 
model. In relatively tranquil regimes, terrorists can replenish and stockpile resources, 
recruit new members, raise funds, and plan for future attacks. Terrorism can remain low 
until an event occurs that switches the system into the high-terrorism regime. Because 
each terrorist attack utilizes scarce resources, high-terrorism states are not anticipated 
to exhibit a high degree of persistence when a shock raises the level of terrorism. On 
the other hand, periods with little terrorism can be highly persistent to shocks. In order 
to measure the differing persistence across the two states, Enders and Sandler (2002) 
acquired quarterly data on the number of incidents containing one or more casualties 
over the 1968Q1 to 200004 period. You can follow along with the data on the file 
TERROR_TYPES.XLS. We first estimated the number of incidents with casualties 
(cas) as the linear AR(3) autoregressive process: 


cas, = 5.91 + 0.26lcas,_; + 0.310cas, + 0.209cas,3+ €, AIC = 1205.72 
(2.83) (2.98) (3.59) (2.40) 
(7.27) 
where cas, represents the number of incidents with casualties. 

The model appears adequate in that it satisfies the standard diagnostic tests. All 
t-Statistics are significant at conventional levels and the point estimates of the autore- 
gressive coefficients imply stationarity. The results of a Dickey—Fuller test allow us 
to reject the null hypothesis of a unit root at the 5% significance level. Moreover, 
the Ljung-Box Q-statistics indicate that the residuals are serially uncorrelated. For 
example, Q-statistics using the first 4, 8, and 12 lags of the residual autocorrelations 
have prob-values of 0.98, 0.52, and 0.72, respectively. 

Correlation coefficients are measures of linear association and may not detect non- 
linearities in the data. If you perform the RESET with H = 3, you should find that the 
prob-value for the test is 0.049. However, with a lag length of 3, Hansen’s threshold 
test yields prob-value of 0.011 with a estimates of t = 37 and d = 1. As such, the most 
appropriate TAR model in the form of (7.14) is: 


cas, = (-5.38 + 0.715cas,_,; + 0.204cas,_. — 0.094cas,_3), AIC = 1196.05 


(—0.35) (2.25) (1.04) (—0.54) 
+ (1.46 + 0.534cas,_, + 0.258cas,_. + 0.239cas 3 (1-1) +6, 
(0.64) (4.18) (2.76) (2.55) 


(7.28) 


where the estimates of the threshold and the delay are t = 37 (sol, = 0 if cas,_, < 38) 
and d= 1. 

Equation (7.28) is clearly over-parameterized, since there are a number of coeffi- 
cients with f-statistics less than 1.96 in absolute value. Even though there are a number 
of coefficients with very small t-values, that AIC selects the TAR model over the 
linear model. (Remember that t is also an estimated parameter). At this point some 
researchers might try to pare down the model. However, is can be problematic because 
the tabulated f-statistics are actually an approximation of the actual distribution since 
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we searched for the best fitting threshold function.” In terms of (7.14), the distribution 
of coefficient a;; depends on the accuracy of the estimated threshold. An alternative 
would be to pare down the model using the AIC or SBC. 

Diagnostic checking indicates that the model is appropriate. For example, the first 
twelve autocorrelations of the residuals are less than 0.14 in absolute value and the 
prob-values for the Ljung-Box Q(4), Q(8), and Q(12) statistics are 0.98, 0.76, and 0.64, 
respectively. Even though the TAR model contains nine parameters (i.e., eight coeffi- 
cients plus T), the AIC selects it over the linear model. 

The threshold model yields very different implications about the behavior of the 
cas, series than the linear model. Since the linear specification makes no distinction 
between high- and low-terrorism states, the degree of autoregressive decay is always 
constant. Regardless of whether the number of incidents is above or below the mean, 
the degree of persistence is quite large; the largest characteristic root of the linear 
model is 0.88. The threshold model, however, indicates that the high-terrorism regime 
is less persistence than the low-terrorism regime. This is consistent with the notion 
that terrorism can remain low until an event occurs that switches the system into the 
high-terrorism regime. 

One way to understand the nature of the system is to consider the forecast function. 
As analyzed in Koop, Pesaran, and Potter (1996), the forecasts and impulse responses 
from a nonlinear model are state-dependent. In terms of (7.28), a positive shock when 
Y;-1 > 37 will be less persistent than the same shock when y,_, is far below the thresh- 
old. Since we are interested in comparing short-run and long-run forecasts in the two 
states (rather than a generalized impulse response function), we use a modified version 
of Koop, Pesaran, and Potter’s methodology. 

For a model with three lags, we select a particular history for y, , y,_;, and y,_. 
For example, in the last three quarters of 1985—a high-terrorism regime— the number 
of casualty incidents were 33, 50, and 40, respectively. Hence, to forecast the subse- 
quent number of incidents from the perspective of 1985:4, we let y,_5 = Y198502 = 33, 
Yi-1 = Y198503 = 50, and y, = Y19854 = 40. We then select 25 randomly drawn realiza- 
tions of the residuals of (7.28). Since the residuals may not have a normal distribution, 
the residuals are selected using standard bootstrapping procedures. In particular, the 
residuals are drawn with replacement using a uniform distribution. Call these residuals 
Er Eno?’ Elas: We then generate y“ | through y",,, by substituting these boot- 
strapped residuals into (7.26) and setting J, appropriately for high- or low-terrorism 
states. In essence, y* į is one possible realization of the cas, series for 1986Q1, y*_, is 
one possible realization of the cas, series for 1986Q2, and so on. For this particular his- 
tory, we repeat the process 1,000 times. Under very weak conditions, the Law of Large 
Numbers guarantees that the sample average of the 1,000 values of y* | converges to 
the conditional mean of y,,, denoted by E,y,,,. Similarly, the sample means of the 
various Ye i, where yeh) is the result for draw k, converge to the true conditional 
i-step-ahead forecasts, that is 


N 
Jim È Yi wn = Ey 4) 
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The essential point is that the sample averages of y* | through y* „į yield the 
1-step- through 25-step-ahead conditional forecasts of the cas, series from the per- 
spective of 198504. Intuitively, because the number of casualty incidents exceeds the 
threshold, the value of cas, should quickly decline from 40 toward the attractor of 31.1. 
Nevertheless, the long-run forecast need not equal the attractor, which can be seen 
by examining the conditional forecasts (indicated by the solid line) shown in the top 
panel of Figure 7.13. Although the expected number of cas, incidents does decline 
toward 31.1, there are two reasons why the long-term forecasts continue to decline. 
Since incidents below the threshold are (on average) more persistent than those above, 
the system’s mean will be below the attractor. Moreover, the forecasts allow for the 
possibility of a regime-switch into the low-terrorism state. As shown in Panel (a) of 
the figure, the long-run expected value is about 28.5 casualty incidents per quarter. 


Panel (a): Casualty Forecasts from 198504 
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Panel (b): Casualty Forecasts from 199804 
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FIGURE 7.13 Nonlinear Forecasts of Casualty Incidents 
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When the number of incidents is high, there is a rapid decline to the threshold, as ter- 
rorist networks cannot maintain high-level, resource-using offensives. A comparison 
of the forecasts with the actual number of casualty incidents (the dashed line in the 
figure) is instructive. The close fit is remarkable given that the forecasts are not the 
successive one-step-ahead forecasts. Instead, the figure traces out the 1-step- through 
25-step-ahead forecasts from the perspective of 198504. 

In contrast, the number of terrorist incidents in the last three quarters of 1998 
were quite low; yj99g92 = 5, Yi99893 = 15, and yj 99g94 = 6. As shown in Panel (b) of 
Figure 7.13, reversion back toward the attractor is quite slow in the low-terrorism state. 
In fact, conditional on the history of 1998:4, the forecasts remain low until the third 
quarter of 2001. The forecasts seem to track the actual number of incidents occurring 
through the end of our data set reasonably well and ultimately converge to those for 
Panel (a). In either case, these long-run forecasts are relatively close to the attractor for 
the high-terrorism state (31.1). 


11. UNIT ROOTS AND NONLINEARITY 


Suppose you were convinced that the interest rate spread displays the type of nonlinear 
adjustment given by (7.1). Before estimating the TAR model directly, you might want 
to determine whether the series does revert to a long-run equilibrium value (called an 
attractor). However, the established tests for the presence of an attractor assume a 
linear adjustment process. For example, the Dickey—Fuller (1979) test for a unit root 
uses a linear adjustment process of the form: 


VY, = Y1 +E; [or Ay, = py,_1 + €;] (7.29) 


Ifthe null hypothesis a, = 1 can be rejected in favor of the alternative —1 < a, < 1, 
it can be concluded that the {y,} sequence decays to the attractor y* = 0. However, if 
the {y,} sequence is generated from a nonlinear model, the Dickey—Fuller test might 
fail to detect an attractor since it is misspecified. Although (7.29) can be augmented 
with deterministic regressors and lagged changes of {y,}, the crucial point to note is 
that the dynamic adjustment process is assumed to be linear. The issue is important 
since Pippenger and Goering (1993) and Balke and Fomby (1997) show that tests for 
unit roots have low power in the presence of asymmetric adjustment. After all, (7.29) 
does not appropriately capture the dynamic adjustment process of a nonlinear model. 

Notice that the discussion above is directly applicable to the findings of Michael, 
Nobay, and Peel (1997). Recall that their aim was to determine whether real exchange 
rates should be modeled as an ESTAR processes. Nevertheless, the dynamic equation 
used to determine whether the U.K.—U.S. real exchange rate was stationary [i.e., 
equation (7.26)] assumes a linear adjustment process. As it turned out, they were able 
to reject the null hypothesis of a unit root. However, in other circumstances, a linear 
test may not be able to detect the presence of an attractor for a nonlinear process. 

To circumvent this problem, there is a large and growing literature designed to test 
for the presence of an attractor in the presence of nonlinear adjustment. For example, 
in Enders and Granger (1998), we generalized the Dickey—Fuller methodology to con- 
sider the null hypothesis of a unit root against the alternative hypothesis of a threshold 
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autoregressive (TAR) model. The simple version TAR model is: 


Ay, = Lpi — 1) + -IDO — 7) + E, (7.30) 
Jl if wi2t 
= {0 if Yia<t oo 


As shown by the phase diagram illustrated in Figure 7.14, when y,_; = T, Ay, = 0. 
However, Ay, equals p;(y,_; — T) if the lagged value of the series is above t and equals 
P2(y;_1 — T)) if the lagged value of the series is below zt. The attractor is t since Ay, has 
an expected value of zero when y,_; = T. Hence, if y,_; = a, Ay, equals the distance ab. 

If we use the specification given by (7.30) and (7.31), it is possible to test for an 
attractor even though the adjustment process is nonlinear. Notice that if p} = p = 0, the 
process is a random walk. A sufficient condition for the {y,} sequence to be stationary 
is —2 < (p1, p>) < 0.3 Also notice that the Dickey—Fuller test emerges as the special 
case in which p; = pp. If it is possible to reject the null hypothesis p} = p, = 0, it 
can be concluded that there is an attractor. However, as in the Dickey—Fuller test, it 
is not possible to use a classical F-statistic to test the null hypothesis p,; = pz = 0. 
Instead, the F-statistics for the null hypothesis p} = p) = 0 are reported in Table G in 
the Supplementary Manual. 

If the null hypothesis of nonstationarity is rejected, it is possible to test for sym- 
metric versus asymmetric adjustment. In particular, if the null is rejected (so that the 
sequence has an attractor), then you can perform the test for symmetric adjustment (1.e., 
P| = p2) using a standard F-distribution. If the threshold is unknown (but estimated 
consistently using Chan’s method), the conjecture is that you can also use a standard 
F-test. However, Hansen shows that small sample properties of the OLS estimates of 
the individual p, and p, values have inflated standard errors and the convergence prop- 
erties of the OLS estimates can be poor. To avoid this problem, you can use Hansen’s 
(1997) bootstrapping method that was described at the end of Section 5. 

An alternative to the basic TAR model is to use what we called the momentum 
threshold autoregressive (M-TAR) model. Since the exact nature of the nonlinearity 


Ay; 


P2lYt1- 7) 


Yt-1 


Pal Y1- T) 


FIGURE 7.14 Phase Diagram for the TAR model 
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may be unknown, it is possible to allow the adjustment to depend on the change in y,_, 
(i.e., Ay,_;) instead of the level of y,_,. In this case, the model becomes (7.30) along 


with the indicator function 
_f. ff Ay4 > 0 
i= t F Apa0 (7.32) 


This variant of the basic model, used by Enders and Granger (1998) and Caner and 
Hansen (1998), allows a variable to display differing amounts of autoregressive decay 
depending on whether it is increasing or decreasing. This specification is especially 
relevant when the adjustment is such that the series exhibits more momentum in one 
direction than the other; the resulting model is called momentum-threshold autoregres- 
sive (M-TAR) model. The F-statistics for the null hypothesis p} = p) = 0 using the 
M-TAR specification is called ®y. As there is generally no presumption as to whether 
to use the TAR or the M-TAR model, the recommendation is to select the adjustment 
mechanism (7.31) or (7.32) by a model selection criterion such as the AIC or SBC. 

To perform the test, follow these steps: 


STEP 1: If you know the value of t (for example t = 0), estimate (7.30). Otherwise, 
use Chan’s method; for each potential threshold r, set the indicator function 
using (7.31). Estimate (7.30) for each potential threshold value and select the 
value of t from the regression containing the smallest value for the sum of 
squared residuals. 

STEP 2: If you are unsure as to the nature of the adjustment process, repeat Step 1 
using the M-TAR model. For each potential threshold 7 set the indicator 
function using (7.32). Select the value of t resulting in the best fit. Use the 
AIC or SBC to select the TAR or M-TAR specification. 

STEP 3: Use the model selected from STEP 1 or STEP 2 to calculate the F-statistic 
for the null hypothesis p; = p, = 0. For the TAR model, compare this sam- 
ple statistic with the appropriate critical value in Table G. The critical values 
depend on sample size (T) and whether you augment the model with lagged 
changes. Use Panel (a) if you estimate t for a TAR model and Panel (b) if 
you estimate t for an M-TAR model. If you know the threshold and estimate 
an M-TAR model, use Panel (c). In the case of a TAR model with a known 
threshold, the test seems to have low power relative to the Dickey—Fuller 
test. As such, the critical values for this case are not reported. 

STEP 4: If the alternative hypothesis is accepted (i.e., if there is an attractor), it is pos- 
sible to test for symmetric versus asymmetric adjustment since the asymp- 
totic joint distribution of p; and p, converges to a multivariate normal. As 
such, the restriction that the adjustment is symmetric (i.e., the null hypoth- 
esis p} = pz) can be tested using Hansen’s (1997) bootstrapping method or 
using a standard F-test as an approximation. 

STEP 5: Diagnostic checking of the residuals should be undertaken to ascertain 
whether the estimated {€,} series could reasonably be characterized by a 
white-noise process. If the residuals are correlated, return to Step 1 and 
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re-estimate the model in the form: 


p 
Ay, = LPO — 7) + (1—1) O1- 7) + p? a,Ay,_; tE, (7.33) 
i=l 
In working with this specification, it is possible to use diagnostic checks 
of the residuals and/or the various model selection criteria to determine the 
lag length. 


An Example 


Enders and Granger (1998) use quarterly values of the 10-year government securities 
(rz,) and the Federal Funds rate (rę) over the period 1958Q1 through 1994Q1. You 
can find the data used in the study on the file labeled GRANGER.XLS. The issue is to 
determine how to model the relationship between the two interest rates. First form the 
interest rate spread as s, = rz, — Fs After a bit of experimentation, the most appropriate 
equation for the Dickey—Fuller test is 


As, = 0.120 — 0.156s,_; + 0.162As,_, +€, 
(1.52) (-3.56) (1.94) 


AIC = 669.79 SBC = 678.68 

The coefficient on s,_, has a t-statistic of —3.56; hence, the null hypothesis of a 
unit root can be soundly rejected. Since the point of this section is to illustrate the 
test for threshold adjustment, we can pretend that the results of the Dickey—Fuller test 
are ambiguous. Nevertheless, diagnostic checking reveals that the equation is inade- 
quate. For example, the RESET with H = 3 and H = 4 have prob-values of 0.0016 and 
0.00009, respectively. Hence there is substantial evidence of neglected nonlinearity. 

Next, estimate a TAR model in the form of (7.30) and (7.31). The value of t 
yielding the best fit is —0.27 so that the resulting TAR model of the spread is 


As, = -0.0661,(s,_, + 0.27) — 0.286(1 — I,)(s,_; + 0.27) + 0.172As,_; + £, 
(-1.59) (-3.67) (2.07) 


AIC = 669.12 SBC = 680.97 

For Step 2, we can estimate an M-TAR model by replacing (7.31) with (7.32). The 
value of t yielding the best fit is 1.64 so that the resulting M-TAR model of the spread 
is 


As, = —0.2991,(s,_, — 1.64) — 0.007(1 — 1,)(s,_, — 1.64) + 0.016As,_, + £; 
(—4.75) (—0.145) (1.183) 


AIC = 662.55 SBC = 674.40 

Notice that the AIC and the SBC both select the M-TAR model even though it has 
two coefficients that appear to be statistically insignificant. You might want to experi- 
ment and estimate the model without these two extraneous coefficients. The F-statistic 
for the null hypothesis that p} = p) = Ois 11.44. If we compare this to the critical values 
for Py, we can reject the null hypothesis of no attractor. As such, we can test whether 
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the adjustment is symmetric or asymmetric. The F-statistic for the null hypothesis 
pı = p2 is 12.24 with a prob-value of 0.0006. Hence, we can conclude that the M-TAR 
best captures the adjustment process of the interest rate spread. The point estimates 
suggest that the equilibrium value of the spread is 1.64. When the spread is increas- 
ing (i.e., when As,_, > 0), the speed of adjustment is fairly rapid. However, when the 
spread is decreasing (so that the long-term rate is falling relative to the referral funds 
rate), the adjustment of —0.007 is almost nonexistent. This is in contrast to the linear 
model; the linear model suggests that the speed of adjustment is —0.158 regardless of 
whether the spread is increasing or decreasing. Moreover, the linear specification sug- 
gests that the long-run equilibrium value of the spread is zero since the intercept has a 
t-statistic of 1.52. 


NONLINEAR ERROR-CORRECTION If you experiment with the data set, you 
will find that both rz, and rs, act as [(1) processes. Since there is a linear combination 
of these two /(1) variables is stationary, the Granger representation theorem indicates 
that there is an error-correction model. However, there is nothing requiring that the 
dynamic adjustment mechanism must be linear. Instead, it seems plausible that the 
error-correction model has the M-TAR form: 


Ar, = —0.031,(s;_) — 1.64) — 0.0701 — IS, — 1.64) 
(0.766) (2.11) 
+ Ay (D)Ary,_) + AyL)Ary_) + €1; 
F = 0.087 F; = 0.521 
Ary, = 0.211,(s,_, — 1.64) — 0.04(1 — 1,)(s,_, — 1.64) 
(2.67) (—0.67) 
+ Ag (D)Arr 1 + Agg(L)Ars,_) + Ex 


where f-statistics are in parentheses, two lags of each variable are used in each equation, 
F; is the prob-value that all coefficients in the polynomial A;(L) = 0, and J, is the 
M-TAR indicator given by (7.32). 

The f-statistics suggest an interesting adjustment process towards the long-run 
equilibrium. Increases in the spread tend to be accompanied by changes in the federal 
funds rate while decreases are accompanied by changes in the 10-year rate. When the 
spread is increasing (1.e., if As,_, > 0), we would expect the Fed-funds rate to increase 
by 21% of the discrepancy between s,_, and the long-run value of 1.64. When the 
spread is decreasing, the long-term rate declines by 7% of the discrepancy. 

The linear error-correction model tells a very different story. If we use the type of 
linear error-correction model used in Chapter 6, we obtain 


Ar, = —0.114é,_; + A, (LAr) 1 + A, (L)Ars,_} + €; 
(—3.30) Fi, = 0.062 Fy, = 0.288 


Ary, = —0.002é,_; + Az,(LAr,)_) + Ax(L)Ars_1 + Ez 
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where @,_, is the residual from the regression of rz, on a constant and rs,. Hence, 
é,_, is the estimated deviation from the long-run relationship obtained by using the 
Engle-Granger technique. 

In contrast to the threshold model, the linear model implies that only the 10-year 
interest rate responds to the discrepancy from the long-run equilibrium. 


12. MORE ON ENDOGENOUS STRUCTURAL 
BREAKS 


In general, models with breaks are not considered to be nonlinear. However, breaking 
and nonlinear models both involve the problem of unidentified nuisance parameters 
under the null hypothesis—the so-called Davis problem. To make the point, recall 
that it is straightforward to perform the type of Chow test discussed in Section 12 of 
Chapter 2 when the date of a potential break is known. For example, in equation (7.34) 
the dummy variable D, represents a structural break in the intercept term occurring at 
time period f*. In (7.35) the break affects the intercept and all of the autoregressive 
coefficients. 


P 
Yi = a) + È, Aii + YoD, + & (1.34) 
i=] 


p p 
Y, = A + py QY i + (x + bY ram) D, +e, (7.35) 
i=1 i 


i=1 


where: D, is the Heaviside indicator indicating that a break occurs at time period ¢*. 

Equation (7.34) is a partial break model where the break is assumed to affect only 
the intercept whereas (7.35) is a pure break model in that all parameters are allowed to 
change. The test for structural change (i.e., D, = 0) can be conducted using a t-test in 
7.34 or an F-test in (7.35). 

The situation is far more complicated when the break date in unknown. Typically, 
the researcher will estimate regression in the form of (7.34) or (7.35) for every possible 
break date and select the one with the best fit. The methodology is reasonable since the 
best-fitting regression does yield a consistent estimate of t*. However, it is no longer 
appropriate to use a t-test (or F-test) to test for the presence of a structural break. Since 
the best fitting regression, out of a large number of regressions, is selected, the distri- 
bution of the test statistic is nonstandard. Moreover, under the null hypothesis of no 
structural change, t* is unidentified. 

Andrews (1993) and Andrews and Ploeberger (1994) develop a test that can be 
used to estimate a single structural break occurring at an unknown date. Recall that a 
single-break model is a threshold model with time as the threshold variable. As such, 
you can estimate (7.34) or (7.35) by performing a grid search for the best-fitting break 
date. The test is feasible since the selection of the best fitting regression amounts to a 
supremum test.* As in the threshold model, it is necessary to ‘trim’ the data so that each 
segment of the breaking series has a sufficient number of observations to be properly 
estimated. The conventional practice is to use a trimming value £ = 0.15 so that each 
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regime contains at least 15% of the observations. (With large samples, a number of 
researchers use £ = 0.10). Under the null hypothesis of no break, the distribution of the 
Andrews (1993) test depends on the number of breaking parameters and the trimming. 
If you use a 15% trimming, with 1, 2, and 3 breaking parameters, the 5% asymptotic y? 
critical values are 8.85, 11.79, and 14.15, respectively. However, with the sample sizes 
typically used in applied work, it makes sense to use Hansen’s (1997) bootstrapping test 
for a threshold model. After all, a single structural break is a threshold model where 
time is the threshold variable. If you are paying attention, you will note that this is 
precisely the test that was analyzed in Section 5 using the file Y_ BREAKS.XLS. Note 
that the Andrews (1993) test does not require that the variance of the error term £, be 
the same in each period; as such you can also use the method to test for threshold breaks 
in the form of 
P 
ay + 2 Y tey ift>f 


= =1 
y= 5 


Yo + by Writ Ey, Ift<t 
l 


Bai and Perron (1998) generalize the Andrews (1993) test by allowing for multiple 
structural breaks. Consider the following autoregressions with k breaks (k + 1 regimes): 


p 


Yı =Q + bs aii + (Diy + Y2Da +: ++ + Dix) + E; (7.36) 
i=l 


p k P 
Y= a9 + $ Yi + YY Dy (r +), ro] Ter We!) 
i=] j=l i= 


i=l 


In the partial break model of (7.36), the breaks are confined to the intercept whereas 
(7.37) is the pure break model. In testing for breaks, Bai and Perron (2003) recommend 
using a trimming value of £ = 0.15 and setting the maximum number of breaks k = 5. 
Moreover, it is necessary to specify the minimum break size as the minimum number of 
observations between breaks. With quarterly data it is usual to specify that a break lasts 
at least a year or two. A large change in a series that lasts only a few periods is more 
likely to represent an outlier rather than true structural change. It should be pointed 
out that (7.36) and (7.37) require heterogeneity in the regression errors. Of course, this 
assumption can be problematic in your data set if the variance of {€,} changes in the 
presence of a break. 

Bai and Perron (2003) develop an algorithm that can efficiently estimate (7.36) or 
(7.37) for every possible combination of break dates. There are two different ways to 
select the number of breaks. First, they develop a supremum test for the null hypoth- 
esis of no structural change versus the alternative hypothesis of k breaks. In essence, 
their algorithm is used to estimate models for every possible combination of breaks 
(given the minimum break size and maximum number of possible breaks) and then 
selects the best fitting combination of break dates. The F-statistic for the null hypothe- 
sis of no breaks against the alternative of k breaks is nonstandard. However, the critical 
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values—called the sup F (k; q) statistics —are calculated by Bai and Perron (1998). The 
notation is designed to highlight the fact that the critical values depend on the number of 
breaks, k, and the number of breaking parameters, q. If the null hypothesis of no breaks 
is rejected, it is standard to select the actual number of breaks using a model selection 
criterion such as the SBC. The AIC is not recommended as it selects too many breaks. 
For q = 1, 2, and 3, the 95% critical for 1, 2, and 5 breaks are: 


q k=1 k=2 k=5 UDmax 


1 9.63 8.78 6.69 10.17 
2 12.89 11.60 9.12 13.27 
3 15.37 13.84 11.15 16.82 


For the supF(k; q) statistic you need to specify the value of k. However, it also 
seems reasonable to test the null of no breaks against the more general alternative of 
some breaks. If the largest of the sample supF (k; q) statistics fork = 1, 2,3, ... exceeds 
the UDmax statistic reported above, you can conclude that there are some breaks and 
then go on to select the number of breaks using the SBC. 

The second method of selecting the number of breaks is to use a sequential test. 
Begin with the null hypothesis of no-breaks versus the alternative of a single break. If 
the null hypothesis of no breaks is rejected, proceed to test the null of a single break 
versus the alternative of two breaks, and so forth. This process is repeated until the 
test fails to reject the null hypothesis of no additional breaks. The method is sequential 
in that the test for the 7 + 1 breaks takes the first 7 breaks as given. At each stage, 
the so-called F(? + 1|@) statistic is calculated as the maximum F-statistic for the null 
hypothesis of no additional against the alternative of one additional break. For q = 1, 2, 
and 3, the 95% critical for f = 0, 1, 2, and 5 are: 


q ?=0 C=1 C=2 C=4 


1 9.63 11.14 12.16 13.45 
2 12.89 14.50 15.42 16.61 
3 15.37 17.15 17.97 19.23 


The sequential method can work poorly if the series is highly persistent or if the 
breaks tend to be offsetting. As such, if the test for f = 0 against the alternative 7 = 1 
is not rejected, use the UDmax test. Then, if the null of no breaks is rejected, assume 
there is at least one break and go on to select additional breaks using the sequential 
method. 


Two Examples 


Example 1 Recall (see Section 12 of Chapter 2 and Section 5 of this chapter) 
that the 150 observations in the file Y_BREAK.XLS were constructed so as to have a 
break at t= 101. Suppose that we have no idea how the series was constructed and 
applied the Andrews test (also called the Andrews-Quandt test) for a single break. 
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Allowing for a possible break in the variance as well as in the coefficients, if you use 
trimming of 0.15, you should find: 


Test-Statistic prob-Value Break Date 
Constant 28.18 0.000 100 
Yia 42.47 0.000 100 
All coefficients 42.58 0.000 100 
Residual variance 2.81 0.595 


The sample values for a break in the intercept, the autoregressive coefficient, and 
for both coefficients are 28.18, 42.57 and 43.58, respectively. These are all highly sig- 
nificant; as such, we can reject the null hypotheses that the intercept is constant, the 
autoregressive coefficient is constant, and that both coefficients are constant. Notice 
that we cannot reject the null hypothesis that the variance of the residuals is constant. 
Moreover, the estimated break date is exactly correct. 

Now suppose that we are concerned that there might be more than one break and 
employ the Bai-Perron test. Again, use a 0.15 trimming, assume that the minimum 
break size is eight periods, and use a maximum of five breaks. Consider the output using 
the pure break specification (so that we allow both the intercept and the autoregressive 
coefficient to change): 


Breaks supF(k; 2) F(? + 1|?) SBC 
0 0.337 
1 29.57 29.57 0.062 
2 16.40 2.59 0.094 
3 12.51 3.56 0.112 
4 10.13 2.29 0.146 
5 9.16 3.71 0.161 


Given that we have two breaking parameters, if we use the supF(5; 2) test, the 
sample value of 9.16 exceeds the 95% critical value of 9.12. As such, we barely reject 
the null hypothesis of no breaks against the alternative of five breaks. However, the 
sample values of the supF (k; 2) statistics drop off rapidly starting from 29.57 and falling 
to 16.40 and ultimately to 9.16. If instead, we test the null of no breaks against the 
alternative of exactly two breaks, the sample value of 16.10 clearly exceeds the 95% 
critical value of 11.60. This illustrates the point that estimating a model with five breaks 
in two coefficients entails the estimation of twelve parameters with a commensurate 
loss in the power of the test. The UDmax test is quite definitive that there are some 
breaks in the series. The largest supF (k; 2) statistic occurs with a single break such that 
supF(1; 2) is 29.57. If we compare this to the 95% UDmax critical value of 13.57, we 
reject the null of no breaks and conclude that there are some breaks. Notice that the 
SBC correctly selects the model with one break. 
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If instead we use sequential method, the sample value of F(1|0) = 29.57 far 
exceeds the critical value of 12.89. Given that we reject the null hypothesis of no 
breaks and accept the alternative of one break, we now consider whether there is a 
second break. The sample value of F(2|1) of 2.59 does not exceed the critical value of 
14.40 so that we do not reject the null hypothesis and conclude that there is a single 
break at t = 100. 


Example 2 Consider the transnational terrorism series shown in Panel (b) of 
Figure 5.1. If you open the file TERRORISM.XLS and examine the series carefully, it 
appears that the mean of the series spikes with the demise of the Soviet Union in 1991 
and falls in 1997. There are several other possible break dates that may or may not be 
significant. If you experiment a bit, you should find that the following AR(2) model of 
the transnational series works quite well: 


trans, = 4.91 + 0.381 trans,_, + 0.42trans,_, 
(3.06) (5.29) (5.84) 


However, the model is misspecified if there are breaks in the series. Although 
(7.37) is the most general model, the pure break model requires the estimation of 
eighteen parameters (six parameters for each lag and the intercept). As such, many 
researchers estimate the partial break model and confine the breaks to affect only the 
intercept. If you estimate an equation in the form of (7.36) and allow for a maximum 
of 5 breaks such that each break is at least two years long, you should find: 


Breaks sup F(k; 1) F(? + 1|?) SBC 


0 4.519 
1 8.34 8.34 4.499 
2 10.76 21.53 4.454 
3 10.57 31.72 4.429 
4 9.76 39.04 4.420 
5 8.92 44.59 4.422 


With one breaking coefficient (i.e., g = 1) the 95% critical value for five breaks 
(1.e., k = 5) is 6.69. Since the sample value of supF (5; 1) = 8.92, we can reject the null 
hypothesis of no breaks. This result agrees with the UDMax test. The largest sample 
value of supF(k; 1) occurs with k = 2. Since 10.76 exceeds the 95% critical value of 
10.17, we can conclude that there are some breaks in the series. If you examine the last 
column in the table above, you will see that the SBC selects a model with four breaks. 

Alternatively, we could use the sequential method. The sample value for the null 
hypothesis of no breaks (i.e., f = 0) against the alternative of one break (i.e., f = 1) 
is 8.34. Since the 95% critical value of the sup F(? + 1|/) test is 9.63 we do not reject 
the null hypothesis and conclude that there are not any breaks. However, recall that this 
test may not work well if the breaks are offsetting or if the data is persistent. As such, 
we rely on the UDmax test that indicates that there is at least one break. To determine 
if there is a second break, note that the sample value of F(2|1) is 21.53. Since this value 
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exceeds 11.14, we reject the null hypothesis of only one break and accept the alterna- 
tive of two breaks. Repeating the process, the sequential method indicates a total of 
five breaks. 

Before proceeding, note that most software packages reparameterize (7.36) such 
that: 


P 
y= £ AY + (YoDo + Y1Dir + Dy +++ + Dit) + €; 
i=l 


is y, for t > fý. Now, if we estimate the partial break model using four breaks, we find: 


As such, the intercept is yọ for t < fi: yı for H >t< f, and so on until the intercept 


trans, = 10.28D + 18.92D, + 35.29D, + 16.04D, + 8.64D, 
(4.45) (6.76) (6.05) (6.50) — (5.00) 


+ 0.150trans,_; + 0.248trans,_, 


(1.88) (3.42) 
Breakpoint Lower 95% Upper 95% 
197503 197004 197901 
199203 199201 199401 
199403 199402 199501 


199704 199602 199903 


The first breakpoint is 197503 so that the intercept (Dp) is 10.28 for t < 197503. 
Note that the 95% confidence interval is such the break could have come as early as 
197004 or as late as 1979Q1. Given that the confidence interval is so wide, we can 
surmise that the break date is not well estimated. The next break occurs at 199203 
so the intercept increases from 10.28 to 18.92 beginning in 197504 and lasting until 
199203. Notice that the confidence interval is quite tight. Break 3 occurs at 199403; 
the length between breaks is just equal to the minimum break size of eight. If you look 
at Figure 5.1, you can see that the 199204 — 199403 period represents a spike in the 
terrorism series. The intercept falls to 16.04 after 199403 and falls again after 199704. 


Nonlinear Breaks 


Unlike the dummy variable approach, structural breaks may be smooth. When you use 
a dummy variable, you are implicitly assuming that the break fully manifests itself at 
date #*. However, it may take a while for the effect of a change to have its complete influ- 
ence on the variable of interest. Although oil price shocks are quite sharp, it generally 
takes several quarters for the full impact to be felt on aggregate output and employment. 
Moreover, some changes actually occur gradually. There is no doubt that computers 
have changed the way many business activities are conducted. However, there is no 
clear date at which the computer revolution can be said to have started. Instead, the 
technological changes spawned by improvements in computer hardware and software 
occurred gradually over time. The point is that breaks, and their effects, need not occur 
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at one particular point in time. As such, a number or researchers have been working on 
models that allow for smooth breaks. Consider the simple modification of the LSTAR 
model in (7.19) and (7.20): 


Yt = a +Y, qerent G)Vi—p + [Bo + Bry] proi + B,Y;-p] +E, 


where 


0 = [1 +exp(-y(t— t)! (7.38) 


In (7.38) the transition variable is time, t, and the centrality parameter is t*. When 
t is far below fr", the process is given by y, = a + a y,_| ++ +++ @pYi-p + £; and when 
t is far above t“, the process is given by (a@ + fo) + (a, + B,)y;_; +--+ + €;. Hence, as 
time progresses, the value of 0 goes from zero the unity so that the coefficients of the 
series evolve smoothly instead of breaking sharply. 


Estimates of a Logistic Break 


The 250 observations of the series shown by the dashed line in Panels (a) and (b) of 
Figure 7.15 were created as 


y, = 1+ 3/[1 + exp(—0.075*(t — 100))] + 0.5y,_, +€, (7.39) 


so that the transition variable is t and the centrality parameter is 100. The series is 
contained on the file LSTARBREAK.XLS. 

Notice that the break affects only the intercept term in that the autoregressive 
parameter is always 0.5. As you can see in the figure, when f is far below 100, the 
series fluctuates around 2 and when ż is large the series fluctuates around 8. Although 


Panel (a): Bai-Perron Breaks Panel (b): Logistic Break 
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FIGURE 7.15 A Simulated LSTAR Break 
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the centrality is 100, the smooth break means that the series starts to display an increase 
around t = 75 and seems levels off at around t = 125. If you estimated the series using 
the Bai-Perron procedure with a maximum of four breaks, you would find four breaks. 
The problem with the Bai-Perron method here is that it uses only sharp breaks. As 
shown by the solid line in Panel (a) of Figure 7.15, the method has to employ a step 
function in order to approximate the single smooth break. Consider the estimated model 
using a minimum span of 8 observations between breaks: 


y, = 0.71 + 1.67D,, + 2.96D>, + 4.19D3, + 4.99D4, + 0.38y,_, 
(4.43) (7.54) (7.57) (9.92) (10.52) (6.70) 


where all D, = 1 except D,, = 0 if t < 52, D} = 0 if t < 91, D3, = 0 if t < 103, and 
Dy, = Oif t < 142. 

In order to use Terasvirta’s (1994) pretest for an LSTAR break, use the same 
methodology discussed in Section 7 and let 6 = [1 + exp(—y*(t—y(t—#*)))]"! = 
[1 + exp(—h,)]~!. The next step is to take third-order Taylor series expansion of 0 
with respect to h, evaluated at h, = 0. From the derivation of (7.21), we know that the 
expansion has the form 

0 = 0.25h, — h? /48 


Here, h, is y(t — t*) so the model can be approximated by: 
y = My + att ant” +t + 0.5y,_) +E; (7.40) 


You can estimate (7.40) and test the restriction a, = a), = a, = 0 or perform the 
LM version of the test. The LM version of the test for a logistic break involves regress- 
ing y, on a constant and y,_, and saving the residuals. Given that time is the threshold 
variable, the estimated auxiliary equation involves regressing the residuals ê, on a con- 
stant, y,_1,4, ??,andr: 


ê, = 0.04 — 0.39y,_, — 4.9*10-3r + 3.38*10-47? — 1.06*10-6F 
(0.15) (—7.22)  (—0.50) (3.33) (—3.98) 


The F-test test for the null hypothesis that the coefficients of t, £ and f jointly 
equal zero is 24.61. With three numerator degrees of freedom, this is significant at any 
conventional level. Next, if you use nonlinear least squares to estimate a model in the 
form of (7.39) you should obtain 


y, = 0.72 + 0.43y,_, + 3.88/[1 + exp(—0.065(t — 97.48)] 
(3.98) (7.49) (8.65) (5.15) (28.79) 


The point estimates are all quite reasonable and you can verify that the resid- 
uals show no evidence of remaining serial correlation. The fitted time path of the 
time-varying intercept is shown by the solid line in Panel b of Figure 7.15. Additional 
details of estimating this series are given in Section 3.8 of the Programming Man- 
ual that accompanies this text. Gonzalez and Terasvirta (2008) contains an excellent 
example of modeling a seasonal series with smooth shifts. 
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13. SUMMARY AND CONCLUSIONS 


Many important economic variables exhibit nonlinear behavior. The difficulty is to 
properly capture the form of the nonlinearity. Once you abandon the linear framework, 
you must address the specification problem. As surveyed in this chapter, there are many 
nonlinear models and there is no clear way to decide which nonlinear specification is 
the best. The issue is important since the use of an incorrect nonlinear specification may 
be worse than ignoring the nonlinearity. Moreover, a linear model can always be viewed 
as a local approximation of a nonlinear process. There are some standard recommen- 
dations for estimating a nonlinear process. The most important is to use a specific to 
general modeling strategy. In particular: 


1. 


Always start by plotting your data. Visual inspection of the data can help you 
detect the nature of the nonlinearity. You can save yourself substantial model- 
ing time if you inspect the data for an outlier or a structural break. 


Fit the series of interest using the best linear model possible. For example, 
you might fit {y,} as an ARMA process using the Box—Jenkins methodology. 
The coefficients should be well estimated and the residuals should show no 
evidence of any serial correlation. 


There are number of tests designed to detect nonlinear behavior. The 
McLeod-Li, RESET and various Lagrange Multiplier tests can be used to 
detect nonlinear behavior. A Lagrange Multiplier test has a specific nonlinear 
model as its alternative hypothesis. You can test for coefficient stability using 
the methods discussed in Chapter 2. Nevertheless, even a battery of such tests 
is not able to reveal the precise nature of the nonlinearity. 


If nonlinearity is detected, you have to decide on the appropriate form of the 
nonlinear specification. There is no substitute for an underlying theoreti- 
cal model of the adjustment process. For example, if your model suggests 
that prices increase more readily than they decrease, some form of threshold 
model is likely to be the most appropriate. 


The fitted nonlinear model(s) should fit the data better than the linear specifi- 
cation and all coefficients should be statistically significant. In most instances, 
you will search over a number of plausible specifications. As such, the indi- 
vidual f-statistics and F-statistics are likely to be misleading. After all, you 
are examining the f-statistic on the best-fitting specification. If you examine 
10 different specifications, on average, you should find one that is significant 
at the 10% level. Because overfitting is a distinct possibility, many researchers 
would use the parsimonious SBC as a measure of fit. Moreover, traditional 
t-tests and F-tests when there are nuisance parameters that are not identified 
under the null hypothesis. Hansen (1997) considers the issue of inference in 
TAR models. 

The generalized impulse response function can help you detect whether 

the nonlinear model is plausible. A useful diagnostic check is to use a 
Granger-Newbold or Diebold-Mariano test (see Chapter 2) to check the 
out-of-sample forecasting performance of the various models. 
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The nonlinear models discussed in this chapter were used to estimate a series {y,}. 
However, it is possible to apply nonlinear models to the equation for the conditional 
variance. For example, the TARCH model discussed in Chapter 3 is an example of a 
nonlinearity applied to the equation for the conditional variance. Hamilton and Susmel 
(1994) show how to apply the Markov switching model to the conditional variance of 
a time series. Higgens and Bera (1992) develop a nonlinear ARCH (NARCH) model 
that posits a “constant elasticity of substitution” functional form for the model of the 
conditional variance. 

In addition, a large literature is growing concerning the presence of unit roots and 
cointegration in the presence of nonlinearities. For example, Granger, Inoue, and Morin 
(1997) develop some of the issues in terms of a nonlinear error correction model. Enders 
and Siklos (2001) extend the TAR unit root test discussed in Section 11 to allow for a 
cointegrated system. Tsay (1998) develops a test that can be used to detect threshold 
cointegration. The appropriate use of the test is illustrated using spot and futures prices. 
Caner and Hansen (2001) a develop a maximum likelihood method to test for a thresh- 
old unit root and Hansen and Seo (2002) extend the analysis to a cointegrated system. 
Kapetanios, Shin, and Snell (2003) develop a simple way to test for a unit root against 
the alternative of an ESTAR model. Another way to think about nonlinear models is in 
the frequency domain. Granger and Joyeux (1980) provide an introduction to the notion 
that a series may be integrated of some order other than an integer. Such nonlinear 
processes may be mean-reverting yet can behave similarly to a unit root processes. 

Many nonlinearity tests and tests for structural change when the break date is 
unknown both entail the problem of an unidentified nuisance parameter under the null 
hypothesis. As such, the distributions of the relevant test statistics are nonstandard. The 
Andrews (1993) test and the Bai and Perron (1998) test allow you to test for structural 
breaks when the break date(s) is unknown. 


QUESTIONS AND EXERCISES 


1. Let p, and p,, denote the price of cotton in Alabama and Mississippi, respectively. The price 
gap, or discrepancy, is p, — Py. For each part, present a nonlinear model that captures the 
dynamic adjustment mechanism given in the brief narrative. 


a. A large price gap (in absolute value) tends to be eliminated quickly as compared to a 
small gap. 

b. The price gap is closed more quickly if it is positive than if it is negative. 

c. It costs ten cents to transport a bale of cotton between Alabama and Mississippi. Hence, 
a price discrepancy of less than 10 cents will not be eliminated by arbitrage. However, 
50% of any price gap exceeding 10 cents will be eliminated within a period. 

d. The value of p,, but not the value of p,,, responds to a price gap. 


2. Draw the phase diagram for each of the following processes 
a. The GAR model: y, = 1.5y,_, — 0.5 y}; +€, 
b. The TAR model: y, = 1+0.5y,_, + €, ify,_, > 2 and y, =0.5+0.75y,_, +, 


if yı <2. 
c. The TAR model: y, = 1 + 0.5y,_, + £, if y,_, > 0 and y, = -1+0.5y,_, + £, if 
y,_, < 0. Notice that this model is discontinuous at the threshold. Show that y,_, = +2 


and y,_, = —2 are both stable equilibrium values for the skeleton. 
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d. The TAR model: y, = —1 + 0.5y,_; + £, if yı > O and y, = +1 + 0.5y,_, + £, if 
Y,-1 < 0. Show that there is no stable equilibrium for the skeleton. 

e. The LSTAR model: y, = 0.75y,_, + 0.25y,_,/[1 + exp(—y,_,)] + £, 

f. The ESTAR model: y, = 0.75y,_,; + 0.25y,_,[1 — exp(—y7_,)] +E, 

3. In the Markov switching model, let p, denote the unconditional probability that the system 
is in regime one and let p, denote the unconditional probability that the system is in regime 
two. As in the text, let p, denote the probability that the system remains in regime i. Prove 
the assertion 


Pi =(1=Py)/(2 = Py, = Pa) 
Po = (1 = pyy)/(2 = Py = Pra) 


4. The file labeled LSTAR.XLS contains the 250 realizations of the series used in Section 9. 


a. Verify that (7.24) represents the best fitting linear model for this process. 

b. Perform the RESET using H = 3. How does this compare to the result using H = 4? 

c. If you software package can perform the BDS test, determine whether the residuals from 
(7.25) pass the BDS test for white noise. 

d. Perform the LM tests for LSTAR adjustment and for ESTAR adjustment. 

e. If you estimate the process as a GAR process, you should find 


y, = 2.03 + 0.389y,_, + 0.20ly,_, — 0.147y?_, +€, 
(8.97) (6.97) (3.48) (—10.57) 


All of the ¢-statistics imply that the coefficients are well estimated. Show that all of the 

residual autocorrelations are less than 0.1 in absolute value. 

How would you determine whether the GAR model or the LSTAR model is preferable? 
5. The file GRANGER.XLS contains the interest rate series used to estimate the TAR and 

M-TAR models in Section 11. 

a. Estimate the TAR and M-TAR models reported in Section 11. 

b. Estimate the M-TAR model without the two insignificant coefficients. 

c. Calculate the AIC and the SBC for the TAR model and the M-TAR model without the 
insignificant coefficients. In your calculations, be sure to adjust the two model selection 
criteria to allow for the fact that you estimated the threshold. 

d. Calculate the multivariate AIC for the linear error correction model. How does this value 
compare to the multivariate AIC for the nonlinear error correction model? 

6. Consider the linear process y, = 0.75y,_, + €,. Given y, = 1, find E,y,, ,,£,¥,,5, and E,y,,;. 

a. Now consider the GAR process y, = 0.75y,_, — 0.25y?_, + €,. Given y, = 1, find E,y,, ,. 
Can you find E,y,,, and E,y,,,? [Hint: (E,y,, ,)” # E,07,,)]- 

b. Use your answer to Part a to explain why it is difficult to perform multi-step-ahead fore- 
casting with a nonlinear model. 

7. The file labeled SIM_TAR.XLS contains the 200 observations used to construct Figure 7.3. 

a. Show that it reasonable to estimate the series as y, = —0.162 + 0.529y,_, + €,. 

b. Verify that the RESET does not indicate any nonlinearities. In particular, show that 
the RESET (using the second, third, and fourth powers of the fitted values) yields an 
F = 1.421. 

c. Plot the residual sum of squares for each potential threshold value. That is the most likely 
value of the threshold(s)? 

d. Estimate the model y, = (0.057 + 0.260y,_,)J, + (~0.464 + 0.402y,_)(1 — Z,) where 
I, = lify,_, > —0.4012 and zero otherwise. 


8. 


10. 


11. 
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e. Show that the performance of the model is improved is the intercepts are eliminated. 


Chapter 3 of the Programming Manual contains a discussion of the appropriate way to pro- 
gram smooth transition regressions, ESTAR models, and LSTAR models. If you have not 
already done so, download the manual from the Estima (Estima.com), www.time-series.net, 
or the Wiley Web site. 

a. In Section 3.7, you are asked to use the data set QUARTERLY.XLS to form the annu- 
alized inflation rate as z, = 400*[log(ppi,/ppi,_,)]. Verify that an AR(4) model is a 
reasonable linear estimate of the inflation rate. 

b. Perform Terasvirta’s (1994) test for a ESTAR/LSTAR adjustment. Verify that the test 
using d = 2 yields the best fit. Does the test point to a linear, an LSTAR, or an ESTAR 
model? 

c. Explain why the dramatic change in inflation in 2008:4 makes the nonlinear estima- 
tion difficult. Verify that applying Terdsvirta’s (1994) to the pre-2008 data indicates that 
adjustment is linear. 


. Chapter 5 of the Programming Manual contains a discussion of the appropriate way to 


program a TAR model. If you have not already done so, download the manual from the 

Estima (Estima.com): www.time-series.net, or the Wiley Web site. Use the data in the file 

QUARTERLY.XLS to construct the logarithmic change in the money supply as: gm2, = 

log(m2,) — log(m2,_,). 

a. Estimate gm2, as an AR(||1, 3]|) process. Verify that this model has very good diagnostic 
properties. Explain why it is especially important to use a parsimonious representation 
when estimating a nonlinear model. 

b. Suppose that the gm2, displays more persistence when it is below the threshold then 
when it is above the threshold. Explain why the sample mean of the gm2, is a biased 
estimate of the actual threshold value. 

c. Use Chan’s method to find the consistent estimate of the threshold. If you use delay fac- 
tors of 1 and 2 you should find t = 0.02392 and t = 0.01660, respectively. Explain why 
the model with d = 2 is superior to that with d = 1? 

In Section 3.6 of the Programming Manual it is shown how to simulate the simple LSTAR 

process: 

y, = 1+30+0.5y,_, + £, where 0 = 1/[1 + exp(—0.075(r — 100))] 


a. Explain how the intercept term evolves over time. In what sense is there a structural 
break in the y, process? 

b. Use Teräsvirta’s (1994) test to verify that the y, series acts as an logistic process. 

c. Estimate the y, series as an LSTAR process and as a threshold process using f as the 
threshold variable. How do the two models compare? 

The file OIL.XLS contains the variable SPOT measuring the weekly values of the spot price 

of oil over the May 15, 1987 — Nov 1, 2013 period. In Section 4 of Chapter 3, we formed 

the variable p, = 100[log(spot,) — log(spot,_,)] and found that it is reasonable to model p, 

as an MA(||1, 3||) process. However, another reasonable model is the autoregressive repre- 

sentation: p, = 0.095 + 0.172p,_, + 0.084p,_,. The issue is to determine whether the {p} 

series contains breaks or nonlinearities. 

a. As illustrated in Figure 2.10, graph the CUSUMs and the recursive parameter estimates 
of the AR(||1, 3||) model. You should find that there is no evidence of parameter 
instability. 

b. Perform the Andrews and Ploeberger (1994) test for a structural break. You should find 
that the sample estimate of the break date is June 21, 1991. Note that this date is very near 
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the lower boundary of the 15% trimming). Moreover, the prob-value for a single break is 
0.073 so that we cannot reject the null hypothesis of no structural change. 


. Perform the Bai and Perron (1998) test. Allow for a maximum of 5 breaks, a minimum 


break size of 8 weeks and a 15% trimming. With three breaking parameters and five 
potential breaks the sample value of the supremum F-test is 9.81. Given that the is less 
that the critical value of 11.15, you should accept the null hypothesis of no breaks. Note 
that the SBC selects 5 breaks. 


. To test this hypothesis, estimate the {p,} series as a threshold process using p,_, as the 


threshold variable. If you perform Hansen’s (1997) test, you should find that t = 1.70 
and that the prob-value is approximately 0.008. As such, you can reject the null hypoth- 
esis and accept the alternative that oil process act as a threshold process. The estimated 
model is: 


p, = [1.56 — 0.079p,_, + 0.072p,_,] + (1 — 1,)[-0.191 + 0.131p,_, + 0.087p,_,] + €, 


where J, = lifp,_, 2 T. 


. Show that it is reasonable to pare down the model such that p, = 1.247, + (1 —J,) 


[0.159p,_, + 0.0876p,_,] + €,. Explain the dynamics of the model when p, is above 
(below) the threshold. 


. What happens if you use p,_, as the threshold variable? 
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speed of adjustment 
parameters, 354 

VAR model, 355 

exponential generalized 
autoregressive conditional 
heteroskedastic (EGARCH) 
model, 156, 156-7 

exponential smooth transition 
autoregressive (ESTAR) 
models 

adjustment process, 441 
LM test, 442 


real exchange rate, 440, 442, 
452-3 

Taylor series approximation, 
444 


forecast error variance 
decomposition, 302 

forecast function, 80, 248—50, 
459 

forward-looking solution, 43 


generalized autoregressive 
conditional heteroskedastic 
(GARCH) model 
Bollerslev’s specification, 
170, 170 
constant conditional 
correlation model, 168 
diagnostic checking, 150 
diagonal vech model 
estimation result, 171, 171 
positive variance, 167-8 
pound/franc correlation, 
171,171 
EGARCH model, /56, 156-7 
fat-tailed distribution, 157-8, 
158 
IGARCH model, 154-5 
impulse response function, 
172-4, 174 
inflation estimation 
Bollerslev’s estimates, 
132-3 
Engle’s model, 131-2 
value-at-risk, 130-31 
maximum likelihood 
estimation 
classical regression model, 
152-3 
log-likelihood function, 
153-4 
nominal exchange rate, 168-9 
NYSE index 
normal distribution, 
estimation, 759, 161-2 
t-distribution approach, 
159, 159-60 
oil prices, 122, 134-5 
one-step-ahead forecast, 140, 
140-41 
squared standardized errors, 
138-40 
standardized residuals, 138-9 
TARCH model, 155-7, 156 
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vech model, 166-7 
volatility shocks, 165-6 


generalized autoregressive 
(GAR) model, 411 
generalized impulse response 
function 
GNP growth, 453-4 
terrorist incidents 
forecast function, 459-61, 
460 
threshold model, 459 
Granger causality 
block-exogeneity test, 306 
likelihood ratio statistic, 306 
and money supply changes, 
307 
standard F-test, 306 
Granger—Newbold test, 85-6 


Hodrick—Prescott 
decomposition, 252—4, 254 
homogeneous equation see 
stochastic difference 
equation 
hypothesis testing 
Dickey—Fuller test, 207 
1(2) variables, 387-9 
lag length and causality tests, 
383-4 
money demand study, 380 
multiple cointegrating vectors, 
385-7 


impulse response functions see 
also generalized impulse 
response function 
ADL models, 280—81 
and confidence intervals, 
299-301, 300 
GARCH model, 172-4, 
174 
identification restriction, 
296 
negative off-diagonal 
elements, 298 
ordering of variables, 296 
plotting, 295 
reverse Choleski 
decomposition, 297 
skyjackings, metal detector 
technology, 262 
innovation accounting, 302 
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integrated generalized 
autoregressive conditional 
heteroskedastic (IGARCH) 
model, 154-5 
intervention analysis 
ADL models see 
autoregressive distributed 
lag (ADL) models 
Libyan bombing effect, 
266-7 
skyjackings, metal detector 
technology 
ARIMA model, 263-6 
impulse response function, 
262 
pulse function, 263, 263 
pure jump function, 263, 
263 
transfer function analysis, 
268 
inverse characteristic equation, 
42-3 
iteration method 
first-order difference equation, 
10 
nonconvergent sequences, 
12-14, 13 


Johansen methodology 
characteristic roots, 
398 
hypothesis testing 
a and B matrices, 381 
characteristic roots, 381 
differencing, 384-5 
1(2) variables, 387-9 
lag length and causality 
tests, 383-4 
money demand study, 380 
multiple cointegrating 
vectors, 385—7 
lag-length test, 389-90 
normalized cointegrating 
vector, 392-3 


lag operators 
application, 42 
higher order system, 42-3 
properties, 40—41 
Lagrange multiplier (LM) test 
ARCH model, 145 
ESTAR, 442 
GARCH model, 130 
LSTAR, 442 


nonlinear model, 417-18 
power, 239 
leverage effect, 155-7, 156 
Ljung—Box Q-statistics, 72, 75, 
137, 142, 150, 280, 416, 450 
logistic smooth transition 
autoregressive (LSTAR) 
models 
AIC and SBC, 452 
auxiliary regression, 443, 451 
LM test, 442 
NLLS, 451 
numerical methods, 452 
RESET test, 450 
smoothness parameter, 440 
testing procedure, 444 
long-run equilibrium 
Engle—Granger methodology, 
361-2 
Johansen methodology, 382 
system stability, 20-21 
LSTAR models see logistic 
smooth transition 
autoregressive (LSTAR) 
models 


macroeconometric models 
estimating structural 
equations, 282-3 
GNP and money base, 284 
reduced-form GNP equations, 
283-4 
Markov switching model, 447-9 
mean square prediction error 
(MSPE), 84-85 
Monte Carlo method 
AR(1) model, 201 
Dickey—Fuller distribution, 
204-6 
nonstationary process, 200 
random walk model, 200, 20/ 
unit roots, 202 
multiequation time-series models 
domestic and transnational 
terrorism, 260, 260 
intervention analysis see 
intervention analysis 
structural multivariate 
estimation limits 
feedback problem, 282 
VAR analysis see vector 
autoregression (VAR) 
analysis 


nonlinear autoregressive 
(NLAR) model, 410-11 
nonlinear model 
ACF and McLeod -Li test, 
413-15 
ARMA model 
bilinear model, 411—12 
GAR, 411 
NLAR, 410-11 
endogenous structural breaks 
Davis problem, 466 
dummy variables, 471-2 
logistic breaks, 472, 472-3 
partial and pure break 
model, 466-7 
sequential test, 468-70 
supremum test, 467—9 
threshold breaks, 466-7 
transnational terrorism 
series, 470-71 
Lagrange multiplier tests, 
417-18 
vs. linear, 408-10 
portmanteau tests, 416 
regime switching models 
artificial neural network, 
445-7 
Markov switching model, 
447-9 
STAR models see smooth 
transition autoregressive 
(STAR) models 
TAR models see threshold 
autoregressive (TAR) 
models 
unidentified nuisance 
parameters 
Davies problem, 418 
endogenous break, 419 
Monte Carlo method, 420 
supremum test, 420 


panel unit root tests 
ADF test, 244 
critical values, 244, 245 
IPS test, 243-4 
limitations, 246-7 
real exchange rates, 245, 245 
parsimonious model, 69, 76, 97, 
129 
partial autocorrelation function 
(PACF) 
first-order autoregression, 
64-5 


properties, 66, 66 
seasonality, 98-9, 99 
second-order autoregression, 
65 
Perron’s test 
drift term vs. trend line, 231 
level dummy variable, 230 
null hypothesis, 229-30 
pulse dummy variable, 
229-30 
power 
definition, 235-6 
DF-GLS test, 241 
Dickey—Fuller regressions, 
238, 243 
Lagrange multiplier test, 239 
Schmidt—Phillips model, 
239-40, 244-3 
purchasing power parity (PPP) 
cointegration, 345-6 
Engle—Granger methodology 
equilibrium regressions, 
371 
lag length tests, 372 
long-run equilibrium, 
370-71 
real exchange rates, 370 
speed of adjustment 
coefficient, 372-3 
unit root tests, 370 
real exchange rates, 211-12, 
212 


random walk process 
autocorrelation function, 185 
cointegration, 348—9 
plus noise, 187-8 
spurious regression, 197-8 
stochastic trends, 184-5, 351 
recursive forecasts, 454 
regime switching models 
artificial neural network, 
445-7 
Markov switching model, 
447-9 
TAR models, 420-1 
regression error specification test 
(RESET), 415-16 
reverse causality, 282 


Schmidt—Phillips model, 
239-40 

Schwartz Bayesian criterion 
(SBC) 


vs. AIC, 73 

diagnostic statistics, 99—100, 
100 

estimated coefficients, 72, 72, 
74, 74, 92-3 


goodness-of-fit, 78 
model selection, 69—70 
weighting factor, 111-12 
seasonal unit root process 
characteristic roots, 223 
HEGY test, 223-4 
nonseasonal and semiannual 
roots, 226 
seasonal difference, 222—3 
Taylor series, 224-5 
seasonality 
autoregressive coefficients, 97 
differencing 
ACF and PACF, 98-9, 99 
airline model, 102 
Q-statistics, 99—100 
seasonal difference, 101—2 
multiplicative model, 97 
seasonal pattern, 96-7 
seemingly unrelated regressions 
(SUR), 291, 303 
Sims—Bernanke decomposition 
Choleski decomposition, 319 
coefficient restriction, 320 
structural shocks, 318-19 
symmetry restriction, 321 
variance restriction, 320—21 
variance/covariance matrix, 
317-18 
smooth transition autoregressive 
(STAR) models 
ESTAR 
adjustment process, 441 
LM test, 442 
real exchange rate, 440, 
442, 452-3 
Taylor series 
approximation, 444 
LSTAR 
AIC and SBC, 452 
autocorrelations, 450 
auxiliary regression, 443, 
451 
LM test, 442 
NLLS, 451 
numerical methods, 452 
smoothness parameter, 440 
squared residuals, 450 
testing procedure, 444 
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NLAR, 439 
spurious regression 
autocorrelation, 196 
regression equation, 195 
stationary and nonstationary 
variables, 198-9 
STAR models see smooth 
transition autoregressive 
(STAR) models 
stationary time-series model 
ARMA model see 
autoregressive moving 
average (ARMA) model 
covariance stationary, 52-3 
particular solution, 54 
stability condition, 54—5 
stochastic difference equation 
see stochastic difference 
equation 
weakly stationary, 52—3 
stochastic difference equation 
cobweb model 
constant coefficients, 19—20 
impulse response function, 
22 
long-run equilibrium price, 
18-19, 19, 21 
one period multiplier, 21-2 
deterministic process 
components, 31—2 
linear time trend, 33-4 
homogeneous solution 
characteristic roots, 23-4 
convergence, 25-6, 26 
higher order equation, 
30-31 
nth-order equation, 14—15 
second-order equation, 
22-4 
stability condition, 27—30, 
29-30 
trigonometric functions, 
26-7 
iteration method 
first-order difference 
equation, 10 
nonconvergent sequences, 
12-14, 13 
lag operators 
forward-looking solution, 
43 
higher order systems, 42—3 
particular solution, 41-3 
properties, 40—41 
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stochastic difference equation 
(Continued) 
nonlinear dynamics, 6—7 
undetermined coefficients 
challenge solution, 34-5 
general solution, 35, 37 
higher order system, 37-8 
particular solution, 35, 37 
stochastic trends 
business cycle, 192-4 
cointegration, 351-3 
decompositions see 
decompositions 
Dickey—Fuller test see 
Dickey—Fuller test 
differencing 
ARIMA(p, d, q) models, 
190 
random walk plus drift 
model, 189 
vs. stationary series, 191-2 
permanent/nondecaying 
component, 183 
random walk model, 184—5 
random walk plus drift model, 
185-7 
random walk plus noise, 
187-8 
trend plus noise model, 188-9 
trend stationary model, 183 
unit roots see unit roots 
structural multivariate estimation 
limits 
feedback problem, 282 
macroeconometric models 
estimating structural 
equations, 282—3 
reduced-form GNP 
equations, 283-4 
variables as endogenous, 
284 
structural VAR, 286 
BQ decomposition see 
Blanchard—Quah (BQ) 
decomposition 
structural decompositions 
Choleski decomposition, 
314-16 
forecast errors and 
structural innovations, 
316 
n-variable VAR, 314-15, 
317 


reduced-form VAR model, 
313 

Sims—Bernanke 
decomposition see 
Sims—Bernanke 
decomposition 


TARCH model see threshold 
generalized autoregressive 
conditional heteroskedastic 
(TARCH) model 

threshold autoregressive (TAR) 
models 

AR(1)process, 422-3 
asymmetric monetary policy 
estimated model, 437 
real GDP, 437 
SSR, AIC, and SBC 
regressions, 438 
Taylor rule, 436, 438 
BL model, 423 
delay parameter, 427 
endogenous breaks, 432-3 
estimation 
high and 
low-unemployment 
regime, 426-7 
ordered threshold values, 
429 
regime dependent 
variances, 424, 424 
GAR process, 422-3 
impulse responses see 
generalized impulse 
response function 
multiple regimes, 427-8 
pretesting, 430-32 
recursive forecasts, 454 
regime switching model, 
420-21 
unemployment rate 
autocorrelation, 434 
McLeod-—Li test, 434 
null hypothesis, 435 
U.S. unemployment rate, 
433 
unit roots 
adjustment process, 
463-4 
Dickey—Fuller test, 461-2 
M-TAR model, 462-4 
nonlinear error-correction, 
465-6 
phase diagram, 462, 462 


real exchange rates, 461 
threshold generalized 
autoregressive conditional 
heteroskedastic (TARCH) 
model, 155-7, 156 
transfer function analysis 
distributed lag, 268 
endogeneous and exogeneous 
variable, 268 
GNP and money base, 284 
lag lengths, 284 
leading indicator, 268 
postestimation evaluation, 78 
trend plus noise model, 188-9 
trend stationary (TS) model, 183 


unbiased forward rate (UFR) 
hypothesis, 6, 345 
undetermined coefficients 
arbitrary constant, 36 
challenge solution, 34—5 
higher order system, 37—8 
stochastic term, 39—40 
unit roots 
Monte Carlo method, 202 
spurious regression 
autocorrelation, 196 
random walk process, 
197-8 
stationary and nonstationary 
variables, 198—9 
TAR models 
adjustment process, 463-4 
Dickey—Fuller test, 461-2 
M-TAR model, 462-4 
nonlinear error-correction, 
465-6 
phase diagram, 462, 462 
real exchange rates, 461 


variance decomposition, 301 
vector autoregression (VAR) 
analysis 
bivariate system, 285 
covariance matrix, 286-7 
domestic and transnational 
terrorism, 309-10 
empirical methodology, 


310-11 
empirical results, 311—13, 
312 


dynamics, 289-90, 289 
forecasting, 291-2 
identification 


Choleski decomposition, 
294 
primitive system, 292-3 
recursive system, 293 
lag length, 290 
multivariate generalization, 
290 
OLS estimates, 285, 290 
overidentified system 
identification procedure, 
321-2 
Sim’s model, 324—5 
stability and stationarity, 
287-8 
structural VAR see structural 
VAR 
testing hypotheses 
AIC and SBC, 305 
asymptotic y? distribution, 
304 


Granger causality, 305-7 
likelihood ratio test, 304—5 
near-VAR, 303 
n-equation VAR, 303 
with nonstationary 
variables, 307—9 
SUR, 303 
vector moving average 
impact multipliers, 295 
impulse response functions 
see impulse response 
functions 
moving average 
representation, 
295 
volatility 
ARCH process see 
autoregressive conditional 
heteroskedastic (ARCH) 
model 
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GARCH model see genera- 
lized autoregressive 
conditional heteroskedas- 
tic (GARCH) model 

stylized facts 

daily exchange rates, /2/, 
121-2 

exchange rate series, 122 

oil, spot price, 122 

real GDP, consumption, and 
investment, 118—20, 
119-20 

short and long-term interest 
rates, 119, 121, 721 


weakly exogenous, 394—5 


Yule—Walker equation, 
62-4 
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